Automatic Construction of Conjoint Attributes and Levels from Online Customer … · 2015-07-28 · Automatic Construction of Conjoint Attributes and Levels from Online Customer Reviews

Automatic Construction of

Conjoint Attributes and Levels from

Online Customer Reviews

Thomas Y. Lee and Eric T. Bradlow∗

April 2007

University of Pennsylvania, The Wharton School

∗ Thomas Y. Lee is an Assistant Professor of Operations and Information Management and Eric T. Bradlow is the K. P. Chao Professor, Professor of Marketing, Statistics, and Education, and Academic Director of the Wharton Small Business Develop-ment Center, both at The Wharton School of the University of Pennsylvania. The authors would like to thank Esther Chen, Ellen Ngai, and Sojeong Hong for helping us with the data coding, and Steven O. Kimbrough, Balaji Padmanabhan, Yoram Wind, Paul E. Green, Abba M. Krieger and attendees of the Utah Winter Information Systems Conference and Florida Decision and Informa-tion Sciences Workshop for useful suggestions and comments. Please send all correspondence on this manuscript to: Thomas Y. Lee, 573 JMHH, 3730 Walnut Street, Philadelphia, PA 19104; [email protected], tel. 1(215)898-3266 fax. 1(215)898-3664 .

Automatic Construction of Conjoint Attributes and Levels from


Abstract

Conjoint analysis continues to be an area of active research due to its enormous (and often deliv-

ered) promise of improved marketing decision-making. However, despite much methodological

progress, the literature has remained curiously silent on a fundamental design question: "How

does one choose the attributes and levels in the first place?"

In this paper, we present a method to support conjoint study design by automatically eliciting an

initial set of attributes and levels from online customer reviews. While existing computer science

research aims to learn attributes from reviews, our approach is uniquely motivated by the conjoint

study design challenge: how to identify both attributes and their associated levels. Our proposed

method has at least three advantages. First, we generate attributes and levels using the language

of the consumer rather than that of designers and manufacturers. Second, the approach runs

automatically. Automated analysis supports the trend towards shorter product lifecycles and

rapid prototyping. Third, we support rather than supplant managerial judgment. The method is

parameterized to allow survey designers to vary the number of attributes and/or levels that are

generated. Managers can choose to use our method either in a stand-alone manner or as a point

of departure for the surveys and focus groups used in common practice.

Key words: conjoint analysis; online customer reviews; semi-structured text; text mining

1

Automatic Construction of Conjoint Attributes and Levels from


1. Introduction

Conjoint analysis has been universally recognized among academics and practitioners alike as

one of the most celebrated tools in Marketing. In part, this is due to the enormous “promise” that the re-

sults can provide including its use in new product introductions (Wittink and Cattin 1989; Michalek et al.

2005), optimal product repositioning (Moore et al. 1999) and pricing (Goldberg et al. 1984), and segment-

ing customers (Green and Krieger 1991), to name just a few. Such power makes many Marketing aca-

demics and practitioners think of conjoint analysis as the “Gold at the end of the rainbow”; and, some-

times it is.

However, how often have those of us who have either implemented or taught conjoint methods

stated: “Choose the attributes and levels wisely, remember Garbage In – Garbage Out!”? This warning

acknowledges that conjoint is dependent on the representation of a customer’s utility as an agglomeration

of preferences for the underlying attribute levels that have been selected. This dependence holds regard-

less of either format (choice-based, ratings-based, ranking-based, constant sum, or self-explicated) or

method for determining the profiles (Huber and Zwerina 1996; Moore et al. 1998; Toubia et al. 2003; Ev-

geniou et al. 2005).

Surprisingly, despite its universally recognized importance, there is little extant research to guide

attribute and level selection (Wittink et al. 1982). There is some literature on the sensitivity of results to

changing attribute selection, omitting an important attribute, level spacing, etc. (Green and Srinivasan

1978). But, how does one generate these attributes and potential levels in the first place?

Both the academic (in the form of textbook chapters) (Lehmann et al. 1997) and practitioner lit-

erature typically set the initial attributes and levels using some ad-hoc combination of (i) qualitative re-

search such as managerial or customer interviews, (ii) focus groups, and (iii) open-ended surveys (see

2

Figure 1). Practitioners may iterate over this ad-hoc combination a few times based on pre-tests, and may

attempt to validate their study design using actual current products and share data; however, considerable

uncertainty and trepidation often remain.

Our automated procedure uses free, online customer reviews to address this question. While the

impact of customer reviews on consumer behavior has long been a source of study (Eliashberg and

Shugan 1997; Chevalier and Mayzlin 2003), and Dellarocas (2003) and Ghose et al. (Ghose et al. 2006)

explore how reviews reflect or shape a seller's reputation, and Chen and Xie (2004) study the implications

of customer reviews for marketing strategy, there has been comparatively little work on what marketers

might learn from the same reviews for purposes of experimental design. To elicit conjoint attributes and

levels, we develop a novel approach that derive them from on-line customer reviews.

We empirically validate our approach on reviews for digital cameras from Epinions.com by com-

paring automatically induced attributes and levels (using our method) to those used in existing print and

online retail buying guides.1 The decision to use digital cameras was not random and reflects its common

use in marketing conjoint studies (Bradlow et al. 2004; Netzer and Srinivasan 2006). This evaluation

highlights three particular characteristics that we believe are worth noting.

1 In situations where many conjoint studies have already been run, one could also validate the results of the automatic procedure by comparing it to past conjoint attributes and levels.

Managerial Judgement

Interviews

Focus Groups

Surveys


Attributes and Levels

Conjoint Study

Managerial Judgement

Interviews

Focus Groups

Surveys


Attributes and Levels

Conjoint Study

Managerial JudgementManagerial Judgement

InterviewsInterviews

Focus GroupsFocus

Groups

SurveysSurveys



Attributes and LevelsAttributes

and LevelsConjoint

StudyConjoint

Study

Figure 1: The Conjoint Design Process

3

First, we generate attributes and levels using the "language of the consumer" rather than that of

product designers and manufacturers. The language of the consumer extends not only to the attributes but

also the levels (the level of detail) by which customers discuss products. Customer terminology does not

always match expert-generated buying guides. Rather than characterizing the differences as errors, auto-

mated analysis may suggest a managerial opportunity to identify mismatches between product manufac-

turers and their customers.

Second, the approach runs automatically. Both consumer and manufacturer preferences evolve

over time. Automated analysis enables firms to rapidly process large numbers of customer reviews, pos-

sibly from different sources, and for different product categories. Automation also supports the trend to-

wards shorter product lifecycles (Van den Bulte 2000) by facilitating the rapid updating of conjoint de-

signs. Our automated requires no human training and makes no domain-specific assumptions about par-

ticular products. By contrast, much of the prior work related to learning concepts from text utilizes su-

pervised machine learning methods that require hand-labeled training data (Nasukawa and Yi 2003; Hu

and Liu 2004; Liu et al. 2005; Popescu and Etzioni 2005).

Third, we support rather than supplant managerial judgment. We do not aim to eliminate the

"human-side" of conjoint attribute and level construction. Indeed, as our results indicate in Section 5,

fully automated processing can lead to attribute and level sets that are excessively large. Managers may

intervene within the process by setting parameters such as thresholds on the number of levels or through

judicious pruning of results. Once completed, the automated results serve as input to conjoint study de-

sign. As shown in Figure 1, managers may use the results as a point of departure for initial survey and

focus group design, bringing the voice of the consumer further into the process.

As a brief overview, our approach is summarized in Figure 2. We begin with the set of all online

reviews in a product category over a specified time frame. For example, in this paper, we consider the

reviews for all digital cameras from Epinins.com. While this data is freely accessible, we acknowledge

that selecting conjoint attributes and levels from reviews of existing products does limit the size and scope

of the attribute and level space. However, as elaborated upon in the discussion below, by aggregating the

4

reviews of all products in the space, our approach encourages the consideration of previously untried at-

tribute and level combinations; and due to the constant nature of review in-flow, is easily updated. In ad-

dition, selectively expanding the definition of the initial product space further expands the range of re-

views from which prospective attributes and levels are drawn.

In our approach, each online customer review is summarized by the reviewer’s written (submit-

ted) list of Pros and Cons (see Figure 3). Our unique approach exploits the co-occurrences of words

within these list-based summaries. While some review sites do not provide user-authored Pro and Con

summaries (e.g. Amazon.com), many including Epinions.com, BizRate, and CNet do (Liu et al. 2005).

Exploiting the structure provided by Pro and Con lists allows us to avoid numerous complexities of auto-

mated language processing; however, numerous challenges remain. As many of the methods that we util-

ize are unfamiliar to the Marketing literature, we next provide a brief overview of the data collection

methods and challenges that are detailed in Section 3 and related appendices. In addition, a glossary of

terms is included as Appendix 1.

The phrases that comprise each review summary are transformed into vectors of words. Each Pro

or Con list is decomposed into separate phrases where each phrase refers to a single product attribute. For

example, the first (pro) list in the first review of Figure 3 produces the phrases: “Easy to use,” “zoom,”

and “panorama.” The second (con) list yields the single phrase “8 mb SmartMedia.” Simple linguistic

transformations, detailed below, normalize for words in past-tense versus present-tense or plural versus

Product Reviews

Phrase x Word

Word Graph

Cluster1

C2

Cn

…

Clique1

C2

Cm

…

A1

Attribute Dimensions Levels

A2

Al

…

ZoomMemory

Panorama

Type: SmartMediaCapacity: 8Units: mb

48

16

Product Reviews

Phrase x Word

Word Graph

Cluster1

C2

Cn

…

Clique1

C2

Cm

…

A1

Attribute Dimensions Levels

A2

Al

…

ZoomMemory

Panorama

Type: SmartMediaCapacity: 8Units: mb

48

16

Figure 2: Process Overview: Learning From Reviews

5

singular, etc. Each phrase in the Pro/Con list is thus reduced to a vector of words; the vectors from all

reviews are combined into a single phrase × word matrix.

The phrase vectors from all reviews (each row of the matrix) are then clustered based upon their

Euclidean distance in the vector space of words. For example, the phrase vector consisting of “Easy,”

“to,” and “use” is equally distant from “zoom” as it is distant from “panorama.” By contrast, the vector

for the phrase "ease of use" is quite similar to the vector for "easy to use," where some standard linguistic

transformations equate "ease" and "easy" as different forms of the same root word. Each resulting cluster

of vectors (phrases) is taken to refer to a single product attribute of interest. Managers may then either

pre-select a target number of initial attributes by selecting clustering parameters (as is commonly done in

k-means procedures), or automatically search for a statistically "optimal" number of attribute clusters

(Lehmann et al. 1997). Because the clusters themselves are unnamed, we heuristically name each cluster

using the three most frequent words that define the cluster, similar to what is commonly done in cluster

profiling.

Having clustered phrases into attributes, the unique challenge posed by conjoint analysis is to

elicit attribute levels, where appropriate. A product attribute defined by phrases such as "ease of use"

may have no levels. Other phrase clusters, however, may actually contain multiple attribute dimensions,

Figure 3: Pro/Con Review Summaries from Epinions.com

6

each with a distinct set of levels. For example, the cluster of phrases that includes "8 mb Smart media

card included" and "8 mb SmartMedia" represent the attribute "memory." However, these phrases in-

clude multiple dimensions of memory. There is memory capacity ("8" as opposed to "4" or "16" or "32");

there are the units of memory ("mb" versus "megabyte"); and there are types of memory ("SmartMedia"

versus CompactFlash). These few examples reveal some of the many challenges.

To elicit levels, we first identify the dimensions of each attribute and then identify the set of val-

ues that each dimension may take. We begin with all of the phrases in a single attribute cluster. For ex-

ample, "8 mb Smart media card included" and "8 mb SmartMedia" are in the same cluster. Phrases from

the reviews for other digital cameras in the same cluster include "Only 8 mb smart media card included"

and "Only a 4 mb card." A novel math programming algorithm then assigns the words (or numbers) in

each phrase to separate dimensions (e.g. "8" and "4" are assigned to the same dimension). In a few sim-

ple cases such as hyphenation, phases such as "Smart media" and "Smart-media" are all recognized as

instances of "SmartMedia" and therefore assigned together.

A single dimension now consists of a set of words (or numbers). The words (or numbers) as-

signed to a single dimension are organized into levels based upon an a priori, user-specified parameter. If

there are more values than the managerially-specified target number of levels, numerical values are

binned by balancing the range of values in each level size. Note that any binning function could be sub-

stituted. Categorical values are clustered based upon the distributions of their related attribute dimensions

(distributional clustering). For example, if there are more digital camera memory types than the target

number of levels, memory types might be categorized based upon their memory capacity.

In the remainder of this paper, after surveying the related literature in Section 2, we review the

preprocessing and underlying data models in Section 3 and describe our algorithmic process in Section 4.

Our data set and evaluation are detailed in Section 5. A discussion of results and future work conclude in

Section 6.

7

2. Related Work

The task of learning conjoint attributes and levels from text is closely related to several different

research streams in the data and text-mining literature. Early research in "sentiment analysis" attempted

to classify customer comments based upon their general tenor (e.g. is the customer expressing happiness

or disatisfaction) (Turney 2002). These models use Natural Language Processing (NLP) to identify word

classes (e.g. positive words) that appear in the text of each review. In a process called supervised learn-

ing, a representative set of reviews is manually labeled with their appropriate classes (e.g. which express

positive sentiment and which are negative). The representative sample is used to tune the model parame-

ters for correctly predicting a review's class label based upon its word composition. Subsequent efforts

have treated the overall sentiment (positive or negative) like a multi-attribute utility and attempted to

identify the opinions paired with each attribute (Turney 2002 ; Nasukawa and Yi 2003; Hu and Liu 2004;

Ghose et al. 2006).

Our work extends the prior literature on sentiment research in two ways. First, we are interested

in a technique that is easily transferable across multiple product categories. Therefore, we do not rely

upon sophisticated NLP techniques that identify the grammatical parts-of-speech of different words.

Rather than supervised techniques that require manual preparation of training documents, we limit human

intervention to parameter settings where managers may set targets for the number of attributes and levels.

Second, we extend the prior work on learning product attributes with techniques for eliciting levels. This

involves discovering not only the words that name the attribute but clustering the words (and numbers)

that describe each attribute.

"Ontology induction" techniques combine statistical methods with knowledge of grammar rules

to learn the vocabulary that characterizes a specific document collection. An ontology is a structured-

vocabulary that contains the words describing a domain of interest as well as the relationships between

those words. Examples of relationships between words include: an "SLR" is-a-type-of "digital camera;"

a "digital camera" is-comprised-of "lens" and "battery" (among other things). Ontology induction is the

process of automatically learning an ontology from text documents that describe the domain. In this con-

8

text, our task is to learn the words that describe product attributes and levels by processing customer re-

views about that product.

In ontology induction, a human expert typically begins with a seed ontology: either a general ref-

erence ontology that lists common words and relationships, (Missikoff and Navigli 2002) or a pre-

existing, domain-specific reference ontology (Modica et al. 2001; Cecchini 2005). Techniques for learn-

ing linguistic patterns (Hearst 1992), database integration (Doan et al. 2003), and frequent item sets

(Borgelt and Kruse 2002) are then applied to grow and refine the starting seed in ways that cluster differ-

ent words that describe the same attribute, distinguish between distinct attributes, and identify levels

within a single attribute (Maedche and Staab 2000; Popescu et al. 2004; Popescu and Etzioni 2005).

Our unsupervised approach neither assumes a seed ontology nor leverages explicit structure such

as HTML or grammatical syntax. Indeed Pro/Con summaries are simply lists of phrases with no associ-

ated linguistic context; Liu et al. (2005) demonstrate that even with a training set, (supervised) techniques

to learn relationship-patterns between words perform markedly less well in the context of such review

summaries. Our constrained optimization approach dispenses with a training set and removes the need

for knowledge of grammatical rules (Lee 2005).

Finally, traditional techniques for clustering categorical data make assumptions about the data

structure and sample size that are generally inappropriate for analyzing customer review text. We briefly

review this literature as well as the application of graph-based methods, recognizing that other approaches

which utilize human intervention may also be of use to marketing practitioners; albeit perhaps less scal-

able than our approach.

Where sentiment analysis and ontology induction techniques learn product attributes, categorical

clustering techniques assume that each customer comment about a distinct product attribute is stored in a

separate row of a relational database table. Within a table column, binning strategies (Han and Fu 1994)

can group values into a limited number of levels. Between table columns, one measures how the values

in one column co-occur with values in a second column (Han and Fu 1994), or more general classification

rules (Suryanto and Compton 2000) are then used to infer attribute-property relationships. For example,

9

memory capacity (a column) might include "4," "8," and "16" and co-occur with the unit property "mb."

Thus, the product attribute memory is decomposed into the properties capacity and units; furthermore,

values of capacity can then be distributed across a user-specified2 maximum number of levels.

The number and content of the attribute levels is then generated by clustering the observed range

of attribute values, in our case from the Pro/Con list. Using the database metaphor, this corresponds to

clustering the values of a single table column. Levels are then hierarchically clustered based upon the

similarity of their respective probability distributions over the other columns within the same table (Baker

and McCallum 1998; Dhillon et al. 2002). For example, we might decompose the product attribute bat-

tery life into duration and a modifying adverb. The duration might include "good," "bad," "short," and

"long." Associated adverbs might include "somewhat," "terribly," and "awfully." Based upon the distri-

bution of their co-occurrences, we would see that "terribly" and "awfully" are clustered as synonymous.

Although graph-based methods have not been applied to product attributes and levels within re-

views as we do here, graphs have been used to cluster and manage categorical data (Gibson et al. 1998;

Ganti et al. 1999; Zaki and Peters 2005). In the context of words and phrases in reviews, we might treat

every word as a node in the graph. Edges would represent the relationship between words that appear

together in a Pro or Con review phrase. Edges are weighted based upon the number of reviews in which

the two words appear together. For example, if "8 MB Compact Flash" appeared in three different re-

views and "8 MB Memory Stick" appeared in four different reviews, the edge between "8" and "MB"

would be seven, the edge between "MB" and "Compact" would be three, and the edge between "8" and

"Compact" would also be three.

Rather than working directly from review text, existing graph-based methods assume that all of

the words are preprocessed into a single database table. Every row represents a different phrase and every

column represents a different attribute or level. Furthermore, existing methods assume that the table is

complete (e.g. there are no empty cells in the table). Unfortunately, before analysis, there is no simple

2 A manager-specified maximum number of levels is important for making any approach practically us-

10

way of determining how many different product attributes and levels customers will mention in their re-

views meaning there is no way of setting the correct number of table columns. Moreover, a comment

about memory like "Only 8 mb Smart media card included" (see Figure 2) uses different columns than a

comment like "2x digital zoom." Thus, our approach clusters customer comments into separate tables,

one for each product attribute. Discovering the dimensions of each product attribute (e.g. the number of

columns in each attribute table) is a key difference between our approach and existing methods. More-

over, our approach adjusts for blank cells which appear when customers, commenting on the same prod-

uct attribute, do not refer to the same attribute dimensions. For example, one customer might comment

on memory capacity ("8" vs. "16" MB) while another might comment on memory type ("Compact Flash"

vs. "Smart Media.") To the degree that we can learn the different tables and their corresponding columns,

graph-based techniques offer a complementary strategy for learning attribute levels. Having reviewed

some alternative approaches to the problem, we revisit our approach, summarized in the introduction, in

greater detail.

3. Preprocessing

In this section, we detail the pre-processing and the underlying data models used to manipulate

the words and phrases drawn from customer reviews. By regarding the list of Pros and Cons as a sum-

mary of the corresponding review, we focus only on the phrases in each list of Pros and Cons. We hy-

pothesize that each phrase comments on a unique attribute. Whether a product attribute is listed as a Pro

or a Con, we process all phrases in the same way.

To extract attribute phrases from the customer input, we assume that each Pro or Con entry is a

list of phrases separated by standard list separators including commas, slashes, and semicolons. Within a

single line of input, we count separators and assume that multiple instances (e.g. two or more commas)

corresponded to a list of candidate attribute phrases.

able. One can test the approach for a different maximum value.

11

To clean the set of resulting attribute phrases, we discarded those candidate phrases that con-

tained non-alphanumeric characters or punctuation that was not used as a list separator. Examples of dis-

carded phrases taken from the digital camera product attribute Computer Requirements include:

“book(tm),' 'windows®98 second edition (se),' and 'windows 98*.' The intuition is that our data set

ranges from several hundred to several thousand phrases depending upon the starting feature concept.

Therefore, we can safely discard outlier phrases. Discarding does raise the question of an optimal sample

size, which we revisit below

Each phrase is itself comprised of its component words. As a standard step in text processing and

information retrieval, we prune all stop-words (a standard list of articles, conjunctions, prepositions, etc.)

from each phrase (Salton and McGill 1983). For example, after removing stop words, the phrase "Quality

of Photos" becomes "Quality Photos" and "Only 8 mb Smart media card included" becomes "8 mb Smart

media card included."

Words are then normalized using a standard process called stemming (Salton and McGill 1983).

Stemming attempts to find equivalences between singular, plural, past and present tense forms of the in-

dividual words used by consumers to describe product attributes and levels. Rather than requiring knowl-

edge of grammar or semantics, stemming is a simple, approximate technique for discovering the root

forms of words.

Finally, phrases of normalized words are reduced to their underlying "bag-of-words" representa-

tion, which eliminates word order (Salton and McGill 1983). Eliminating word order allows us to equate

different grammatical permutations of the same pruned, normalized phrases. For example, "Includes an 8

mb Smart media card" and "Only 8 mb Smart media card included" are identical in the pruned, normal-

ized, bag-of-words representation.

Our process calls for clustering phrases based upon the product attributes that each phrase de-

scribes. To facilitate phrase clustering, we transform the list of phrases in bag-of-words form into a

phrase × word matrix. If i = 1, …, I indexes over phrases and j = 1, …, J indexes over words, every entry

in the matrix(i,j) measures the importance of a word j in characterizing or defining the product attribute

12

represented by phrase i. Every row of the matrix represents the corresponding phrase in the vector space

of words, the familiar vector-space model used in information retrieval. We determine the importance of

a word to a particular product attribute by using a derivative of the TF-IDF (Term Frequency-Inverse

Document Frequency) metric developed for information retrieval (Salton and McGill 1983).

First, the number of times that a word j appears in a phrase i is multiplied by the number of times

the phrase i appears in the set of all review phrases. The word count is adjusted by the distribution of

word j over all phrases. Words that appear in too many different phrases are less likely to uniquely char-

acterize a single product attribute and hence are discounted more heavily than words that appear in fewer

phrases.

Second, we further adjust a word’s importance by using frequency statistics from a second, unre-

lated product domain. The phrase × word matrix includes some sentiment words such as “good” or

“great” which occur with high frequency in a limited number of phrases, leading to deceptively high im-

portance values. However sentiment words are likely to appear in reviews for unrelated products. Words

characterizing product-specific attributes are less likely to appear in the reviews of unrelated products.

Therefore, we discount our initial importance statistics using the phrase × word matrix constructed from

reviews of an unrelated product. Details on calculating word importance appear in Appendix 2.1.

While the phrase × word matrix captures word frequencies within a phrase, it does not fully cap-

ture the co-occurrences of words that reappear in different phrases of different reviews. As an example,

in Figure 4a, we begin with phrases from several different reviews. When considering phrases that apply

to only one product attribute (Figure 4b), it is easy to see how word co-occurrences can help align indi-

vidual words (or numbers) into separate dimensions Figure 4c.

To automatically align words into dimensions, we model the phrases of a particular product at-

tribute in a graph. Every word is a node in the graph and every edge between two nodes denotes the co-

occurrence of the corresponding words within a phrase (see Figure 5a). Because words within the same

dimension never co-occur in a single phrase (e.g. in Figure 4, a memory card is never both 8mb and 4mb),

13

our word-phrase graph exactly satisfies the definition of an n-partite graph. The n parameter defines the

total number of attribute dimensions. The partite characteristic guarantees that there are no edges be-

tween nodes within the same partition (e.g. no edges between words in the same attribute dimension). By

extension, we reason that, the space of all possible attribute-level permutations satisfies a complete n-

partite graph where all nodes in one partition are connected to every other node in every other partition

(See Figure 5b). Details on the graph model and its n-partite property are expanded upon in Appendix

2.2.

. 4. Analysis

Having established the underlying data preparation and data models, we revisit the steps from

Figure 2: (i) Phrases are clustered into product attributes – Section 4.1, (ii) attributes are divided into

their constituent dimensions and the words in each phrase are aligned with their appropriate dimension –

Olympus: Quality of Photos, …, Battery life (very very good), Only 8 mb Smart media card includedHP: …, only a 4 mb card, virtually no battery life, no AC adapter, poor zoomFuji: Great picture quality, 16 mb, battery life, compact, …Canon: Great feel, good battery life, 12 second video capture, only 8 mb card, …

81648D 1

mbmb

cardmbincludedcardmediasmartmbD 6D 5D 4D 3D 2

only 8 mb card

16 mb

Only a 4 mb card

Only 8 mb Smart media card included

c. Dimensions of the Attribute "Memory"b. Phrases for the Attribute "Memory"

a. Phrases From Online Reviews

Olympus: Quality of Photos, …, Battery life (very very good), Only 8 mb Smart media card includedHP: …, only a 4 mb card, virtually no battery life, no AC adapter, poor zoomFuji: Great picture quality, 16 mb, battery life, compact, …Canon: Great feel, good battery life, 12 second video capture, only 8 mb card, …

81648D 1

mbmb


81648D 1

mbmb


only 8 mb card

16 mb

Only a 4 mb card


only 8 mb card

16 mb

Only a 4 mb card


c. Dimensions of the Attribute "Memory"b. Phrases for the Attribute "Memory"

a. Phrases From Online Reviews

Figure 4: From Phrases to Attributes to Attribute Dimensions

16

4

8

mb inclmediasmartcard

a.

incl

16

4

8

mb mediasmartcard

b.

16

4

8


16

4

8


a.

incl

16

4

8

mb mediasmartcard

b.

incl

16

4

8

mb mediasmartcard incl

16

4

8

mb mediasmartcard

b.

Figure 5: Word Co-occurrence as an N-Partite Graph

14

Section 4.2, and (iii) each property is divided into levels – Section 4.3. The section concludes with a

number of refinements that attempt to address noise within the process – Section 4.4.

4.1. Clustering phrases into product attributes

In the first step, we begin with the vector-space representation of phrases drawn from the Pro/Con

review summaries. More formally, given the phrase × word matrix(i,j) over the set of I phrases and the

set of words J, we seek to separate phrases into a set C of k mutually exclusive and exhaustive concept

clusters { }∅=→≠∀= jijiki ccjiIccc IUKU ,1 ; . Of course, the clustering is necessarily dependent upon and

susceptible to the quality of phrase parsing. Poor parsing (e.g. a single phrase that combines multiple fea-

tures) can introduce noise into the resulting clusters. However, our objective is to capture all phrases cor-

responding to a particular feature in one cluster. It is worth noting that a feature/concept can be distin-

guished at somewhat arbitrary levels of granularity. Thus, it is possible that "digital zoom" and "optical

zoom" could be clustered as distinct features or aggregated as the single concept "zoom." One of the

choices made, therefore, is the degree of granularity desired. "Rougher" granularity would typically lead

to fewer unique concepts (and hence conjoint attributes), but possibly less distinct concepts; vice-versa

for a finer grain. Clustering algorithms are typically distinguished by their means for measuring similar-

ity and their metric for separating clusters. In this work, complementing our matrix representation of

phrases and words, we use the cosine measure of angular distance between vectors to calculate similarity.

The cosine measure is then applied to the phrase × word matrix using the well-studied k-means clustering

algorithm.

The quality, QC, of a k-means clustering, C, is calculated by the sum of the distances from each

vector in a cluster to that vector's centroid. Following (Zhao and Karypis 2002), this metric is more sim-

ply defined as the sum of the length of the composite vectors:

( )( ) ( )∑ ∑∑∈∀ ∈∀∈∀

==Cc Cc

icv

ii ii

ccompositeccentroidvQC ,cos where ( ) ∑∈∀

=icv

i vccomposite (1)

15

Clique nice 6x optic zoom Pro Con review summaries Tokenized phrases P 1 P 2 P 3 P 4

Zoom zoom zoom Long 6x optical zoom long 6x optic zoom long 6x optic zoom standard 3x optical zoom standard 3x optic zoom standard 3x optic zoom nice optical zoom nice optic zoom nice optic zoom 6x zoom is nice 6x zoom nice nice 6x zoom 5x optical zoom 5x optic zoom optic zoom

Table 1: Using a maximal clique for logical assignment

Because k-means is known to be extremely sensitive to its initial conditions, we repeat the algorithm ten

times, beginning with a new, random set of k centers and pick the solution that maximizes QC.

4.2. Dividing attributes into dimensions

Having generated phrase clusters corresponding to product attributes, our next objective is to

identify those attribute dimensions for which conjoint levels are defined. Recall that we can visualize the

phrases describing a single attribute in a table where attribute properties constitute table columns and

phrases constitute table rows; each word in a phrase is assigned to one column (see Figure 4). To derive

this figure, we generate the n-partite word co-occurrence graph by selecting the number of partitions n

and then assigning the words of each phrase to its appropriate dimension.

Colloquially, to discover the number of columns, we would like to find some combination of cus-

tomer comments or phrases that uses distinct words to make explicit reference to every relevant attribute

dimension. By modeling all words in a graph (see Section 3), we discover this combination of phrases by

heuristically searching for the largest maximal clique (see Appendix 2.2).

Applying this step to several phrases, for the digital camera attribute zoom, is depicted in Table 1.

From the left, the first column lists literal phrases taken from Pro/Con review summaries. The second

column lists the corresponding, normalized word form. A maximal clique is shown in the top row (i.e.

"nice," "6x," "optic," "zoom"). Note that this example illustrates how a maximal clique is constructed

from two or more phrases.

Having identified the number of attribute dimensions, we can align the words in the remaining

phrases with their corresponding columns. We assign words to attribute dimensions subject to the mutual

exclusivity constraints represented by each phrase. No two words in the same phrase may appear in the

16

same attribute dimension (the same column). Thus, each phrase represents ⎟⎟⎠

⎞⎜⎜⎝

⎛2m pair-wise constraints

where m is the number of words in the phrase. Pair-wise constraints are consistent with disjoint cluster-

ing and assume that no attribute dimension is described by two or more words and that no single product

can have two or more values for a single dimension (e.g. zoom is not both 2x and 3x).

We define the assignment problem using the maximal clique. A constrained logic program (CLP)

implements a bounds consistency approach to resolve the problem. We define the assignment problem

using the maximal clique. In the bounds consistency approach, we invert each mutual exclusivity con-

straint and express the complementary constraint as a set of candidate assignments. If the phrase con-

straints, taken together, are internally consistent, then the candidate assignments for a given word are

simply the intersection of all candidate assignments as defined by all phrases in the cluster containing that

word.

Continuing with the example in Table 1, the lower half of the table demonstrates how normalized

phrases from the left are mapped to attribute properties (columns). The interaction between adjacency

constraints from multiple phrases naturally constrains words to a unique assignment. The example also

illustrates two limitations of our strong assumption regarding maximal cliques. Given a sufficiently large

sample of phrases, we assume that a maximum clique would encompass all (relevant) properties of a

given product attribute. For reasons of computational complexity, we use a maximal (not necessarily a

maximum) clique. As a consequence, our representative table row may miss certain properties, and

words describing different properties are erroneously combined. In Table 1, "standard" and "nice" are

logically forced into the same property/column. Second, certain dimensions or levels may remain unas-

signed due to an insufficient number of examples. The word "5x" remains unassigned in Table 1 because

it is under-constrained. Whether "5x" is an instance of Property 1 or Property 2 is ambiguous. We revisit

these limitations below.

4.3. Dividing properties into levels

17

The product review summaries are now reduced to a number of tables where each table represents

a product attribute and each table column is a dimension of the respective attribute. The values in each

column represent levels of the corresponding attribute dimension. For example, '4' and '8' are two levels

of digital camera memory capacity. Likewise, '3x' and '6x' are levels of optical zoom magnification. Un-

fortunately, our CLP algorithm may result in properties with more than five or six values from which a

customer is asked to choose. To limit the number of levels for a given property, we apply distributional

clustering (Pereira et al. 1993) to combine levels until the total number of levels is reduced to a specified

target number (e.g. six) that may easily be modified at the user’s discretion. Further details appear in Ap-

pendix 2.3.

4.4. Filtering the clusters

Both the initial clustering of phrases into product attributes and the subsequent assignment of

words to attribute properties are inherently imperfect. Inconsistencies may emerge for any number of

reasons including: Poor parsing, the legitimate appearance of one word multiple times within a single

phrase (e.g. the phrase ‘digital zoom and optical zoom’ duplicates the word ‘zoom’) or even “inaccura-

cies” by the human reviewers who write the text that is being automatically processed. This could result

in a single attribute property divided over multiple table columns. For example, the reviews from Figure

3 include both "SmartMedia" as a single word and "Smart" and "media" as two separate words. Alterna-

ively, multiple product attributes may appear in the same cluster. '[C]ompact flash' and 'compact camera'

are clustered together based upon their common use of the word 'compact,' yet refer to distinct attributeat-

tributes.

To address the problem of robustness in the face of noisy clusters that include references to addi-

tional product attributes or have different properties for the same attributes, we extend our CLP approach

to simultaneously cluster phrases and assign words. Detailed further in Appendix 2.4, the extended CLP

prunes phrases by recursively applying co-occurrence constraints; two phrases in the same review cannot

describe the same attribute just as two words in the same phrase cannot describe the same attribute di-

mension.

18

Unfortunately, even the extended CLP approach is imperfect. Some of the tables will represent

distinct product attributes. Others will simply constitute random noise. Individual tables are supposed to

represent distinct product attributes, so we assume that meaningful tables should contain minimal word

overlap. With this in mind, we apply a two-stage statistical filter to further filter noisy clusters. Details of

the statistical filter are provided in Appendix 2.5.

5. Evaluation

Early research in ontology induction was limited to “proof of procedure” based upon the subjec-

tive assessments of the researchers themselves and/or subject-matter experts (Missikoff and Navigli

2002). More recently, research in ontology induction and the analysis of customer reviews has begun to

develop more objective metrics. In this section, we report results from the application of our automated

process to a real domain. We select several popular print and online buying guides as the “gold standard”

and compare our automatically generated attributes and levels to that standard.

5.1. Data and metrics

Our data set consists of 8,226 online digital camera reviews downloaded from Epinions.com on

July 5, 2004. The reviews span 575 different products and product bundles that range in price from $45

to more than $1,000. The digital cameras range in resolution from 1MP to more than 6MP and vary in

size from pocket-size to single lens reflex (SLR).

We compare our automatically derived attributes to publicly available, expert-generated attributes

in the form of ten print and online buying guides. The reference sources each list a minimum of 5 product

attributes and a maximum of 26. The average number of product attributes is 14. After processing our

experimental data set, we compare the automatically induced attributes with each reference source. Bor-

rowing from the Information Retrieval literature, we use precision (P) (Salton and McGill 1983) to meas-

ure "how many of the induced attributes and/or properties are actually used in professional reviews and

online buying guides." By contrast, recall (R) asks "how many of the attributes and/or properties used in

practice are automatically induced?" More formally, if X is the set of generated attributes, and Y is the

set of attributes in the reference source,

19

X

YXP

I= and

YYX

RI

= (2,3)

An added complexity to evaluation is the hierarchical nature of the product attribute space. Prod-

uct attributes in one reference source might be automatically extracted as a dimension or level and vice

versa (e.g. see Figure 6: optical zoom and digital zoom appear as independent attributes in the reference

source but as levels of the type dimension of a single zoom). To analyze precision and recall on product

attributes, we define precision and recall containment (P+ and R+) to allow a more specific term to qual-

ify as a positive match for a more general term, provided that the more specific term appears as a dimen-

sion or level, and vice versa.

The problem of containment is particularly acute when evaluating automatically generated levels

because there are so many more possibilities to consider. As a simplifying step, to measure precision and

recall for levels, we collapse the hierarchies into the union of all attributes and levels. In effect, a word

from the automatically generated hierarchy can match anywhere in the reference hierarchy and vice-versa.

The intuition is drawn from Popescu et al. (2004), who compared two hierarchies (ontologies) by compar-

ing all possible permutations of the sub-hierarchies.

5.2. Clustering phrases

Beginning with our set of customer reviews, we parsed the Pro and Con lists as described in Sec-

tion 3 to produce a phrase × word matrix that is 14,081 phrases by 3,364 words. We then set k = 50 and

iterated k-means clustering 10 times, selecting the best resulting output based upon QC (Eqn 1). The se-

lection of k = 50 was set by following Popescu et al. (2004) and assuming the union of product attributes

zoom

type magnification

digital optical 2x 3x

digital zoom

2x 3x

optical zoom

2x 3x

Attribute

Dimension

Level

automatically generated reference source

Figure 6: Comparing Automatically Generated Attributes and Levels to a Reference Source

20

from all of our reference buying guides. Relying upon domain expertise is consistent with practitioners,

who rely upon subjective measures of what is most appropriate for the domain at hand (Tan et al. 2006).

More objective, domain independent measures for determining an optimal value of k are an open research

question.

Given an initial set of 50 clusters (from k-means), our next step is to further filter the initial clus-

ters into database tables. The CLP process produced a total of 672 smaller tables from the 50 initial clus-

ters. Applying a χ2 threshold of 0.001 and further filtering the results using the Spearman Rank test rs

(see Appendix 2.5), we are left with 47 tables or product attributes (see Table 2). Though we might have

expected 50 sub-clusters, one for each of the initial clusters, this is not the case. For some initial clusters,

none of the generated tables passed the statistical filters. In other cases, multiple tables from the same

initial cluster had the same, maximum rs score, delineating multiple product attributes within the same

initial cluster.

In the final step, we apply distributional clustering to elicit levels for every dimension (column)

of every product attribute. Recall that we make two strong assumptions in extracting levels. First, we

treat all levels as categorical, so even domains like memory capacity or megapixel resolution are treated

as finite and discrete. This is consistent with conjoint, where even continuous attributes like price are

treated as categorical so that non-linear utilities may be found. Second, we initially assign the maximum

number of levels to six, following much of the marketing literature (Lehmann et al. 1997). If there are

small (camera size; LCD

Low resolution (image resolution; weight; battery usage)

worth (accessories)

quality (image quality; camera)

image (lens; (pic quality)

look (camera;pic quality)

feel (cheap; quality)

design (picqual; design)

camera (price; value; quality)lot (of features)fast (shutter;

download)bad color (color; zoom; image)

optionsabil, focus, aperture

fit (ergonomic)camera bodyinterface (user)

easylearn

read instructions

durablsoftware10x (zoom)sensitivity (ISO)view finderlcdlag (shutter lag)

mega pixel (resolution)

disk access (memory)

storage (memory)4mb (memory)

fun(pic qual)

good (pic qual)

indoor (pic qual)

quality(pic qual)

incredible(pic qual)

crisp (pic qual)

fabulous (pic qual)

power ac adapter

battery (usage)

bad (power usage)

battery (battery life)battery life

lens (cap)

mpeg (video/audio)

firewire(connectivity)red eye (flash)large (size)

1 2 3 4 5 6 7

8

19

9 10 11 12 13 14

15 16 17 18

22 23 24 25 26 27 28

29

20 21

30 32 33 34 35

36 37 38 39

31

4140

43 44 45 46 47

42small (camera size; LCD

Low resolution (image resolution; weight; battery usage)

worth (accessories)

quality (image quality; camera)

image (lens; (pic quality)

look (camera;pic quality)

feel (cheap; quality)

design (picqual; design)

camera (price; value; quality)lot (of features)fast (shutter;

download)bad color (color; zoom; image)

optionsabil, focus, aperture

fit (ergonomic)camera bodyinterface (user)

easylearn

read instructions

durablsoftware10x (zoom)sensitivity (ISO)view finderlcdlag (shutter lag)

mega pixel (resolution)

disk access (memory)

storage (memory)4mb (memory)

fun(pic qual)

good (pic qual)

indoor (pic qual)

quality(pic qual)

incredible(pic qual)

crisp (pic qual)

fabulous (pic qual)

power ac adapter

battery (usage)

bad (power usage)

battery (battery life)battery life

lens (cap)

mpeg (video/audio)

firewire(connectivity)red eye (flash)large (size)

1 2 3 4 5 6 7

8

19

9 10 11 12 13 14

15 16 17 18

22 23 24 25 26 27 28

29

20 21

30 32 33 34 35

36 37 38 39

31

4140

43 44 45 46 47

42

Table 2: Automatically generated product attributes

21

more than six distinct values for the property of a product attribute, then distributional clustering com-

bines values until no more than six clusters remain. If there are fewer than six initial levels, then every

value is assigned to its own level. Of our 47 tables, the number of properties ranged from 3 to 6 and aver-

aged 4. Of the 183 properties that we generated, only 42 had more than six distinct values requiring clus-

tering of levels.

Informally, we can first evaluate the resulting product attributes, dimensions, and levels by man-

ual inspection. To facilitate the presentation, we apply a naïve convention for naming the database tables

that correspond to each attribute: scan the table for rows containing a single word (singletons). Label the

table with those singletons. For example, we would label the attribute described in Table 1 as "zoom." If

there are no singletons, name the table using the most frequent word(s) from the 2-tuples (or 3-tuples if

there are no 2-tuples etc.) Sometimes, the resulting names are not particularly revealing. For example,

the table containing "smart media" that derives from Figure 4 has no singletons. The most frequent word

from among the 2-tuples is "mb." Such a name is more relevant when considered in the product context.

A full list of the 47 potential attributes are listed in Table 2. Comments are inserted in parenthe-

ses to provide context to the automatically generated name as well as to indicate where certain product

attributes are duplicated. We observed that of the 47 attributes, seven refer to picture or image quality,

three refer to memory or picture storage, three refer to battery life, and two refer to special camera fea-

tures or modes. Excluding duplicate and noisy clusters, the 25 remaining automatically generated attrib-

utes are less than the average number of attributes used in a recent survey of commercial, traditional con-

joint studies (Hartmann and Sattler 2002) yet more than is common in academic settings.

For each product attribute, the CLP generated an assignment to elicit properties and levels. A

listing of automatically generated properties and levels for each of the 47 attributes is available in a sepa-

rate appendix available from the authors upon request. The four attributes and their associated levels

highlighted in Table 3 indicate directions for future research and improvement.

In particular, reading the table clockwise from the upper-left, the first attribute is memory. Two

limitations alluded to earlier are exemplified here. First, memory reveals the vulnerability of the CLP

22

method to levels or properties that involve more than one word. The word "4mb" embeds two properties:

capacity and unit. Conversely, the two words "compact" and "flash" are more appropriately parsed as a

single unit "compactflash." Second, this product attribute illustrates the limitations of our maximal vs.

maximum clique assumption. In context, "wimpy" and "small" modify capacity, hence they should con-

stitute a distinct property. Unfortunately, we did not find the corresponding constraints. Larger data sets

would likely ameliorate this issue.

The second attribute, lens cap, illustrates the impact of applying the CLP to a cluster that includes

multiple attributes. In this instance, two phrases involving the distance between the viewfinder and the

lens incongruously pair "cap" and "viewfinder." More significantly, lens cap illustrates a limitation of

applying distributional clustering to elicit levels.

The third and fourth attributes illustrate two different types of cluster impurties. In shutter lag, a

single phrase "long time battery" is inappropriately mixed with phrases about long shutter delay between

pictures. In zoom, the phrase "10x zoom is worth every penny" is relevant, but introduces orthogonal

properties. The implications are identical to using a maximal clique that omits key properties for the CLP

assignment. As noted earlier, the problem of under-constrained words is also evidenced in the attribute

silly, problem, manual, loose, lock, leash, issues (with), interferes (with), faulty, cumbersome, clip, bothersome, detached, awkward

separate

string

unattachedviewfinder

crappycaplens

lens (lens cap)

flashcompact

inadequateskimpymeg4wimpy

smallincludedexcessstandard8size

4mbsuppliedstickmemorycompactflash16capacity

cameracomes (with)cardmediasmart32mb

4mb (memory)

battery

slowpictureshutter

flashlongbittimelagshot

lag (shutter lag)

worth

excellentneed

nicepenny10x

smallest, awesome, lens, powerful

nikkor6x

standard, confiningstabilized4x

longzoomoptical3x

10x (zoom)


separate

string


crappycaplens

lens (lens cap)


separate

string


crappycaplens

lens (lens cap)

flashcompact





4mb (memory)

flashcompact





4mb (memory)

battery

slowpictureshutter


lag (shutter lag)

battery

slowpictureshutter


lag (shutter lag)

worth

excellentneed

nicepenny10x


nikkor6x


longzoomoptical3x

10x (zoom)

worth

excellentneed

nicepenny10x


nikkor6x


longzoomoptical3x

10x (zoom)

Table 3: Automatically generated attribute levels

23

zoom. Notably, "5x" and "7x" magnification are excluded from the attribute property because there is

insufficient data to fully constrain the words "5x" and "7x."

5.3. Precision and recall assessment

In addition to an informal reading of the automatically extracted attributes and levels, we also

compared automatically extracted product attributes (using precision and recall) to those attributes used in

several prominent online buying guides. The first three columns of Table 4 describe the reference

sources. Epinions (A) represents the attributes and levels by which customers can browse the digital

camera product mix and Epinions (B) represents a buying guide available on the Epinions website. CR02

– CR05 represent print buying guides from Consumer Reports for the years 2002 through 2005. The next

two columns report the strict standard where reference attributes must exactly match those that are auto-

matically generated. The columns labeled P+ and R+ apply the broader standard of containment, where

an attribute level in the reference may match a product attribute in the automatic output and vice versa. In

the final two columns, we consider the union of attributes and levels and report precision and recall. This

comparison is not applicable to Consumer Reports, which does not provide levels.

A quick review suggests that the automated extraction performs with varying quality relative to

the on-line buying guides; recall containment (R+) is the highest, reaching levels of 70%, and precision

indicates lower alignment. Given that different sources may be subject to different biases, and that tech-

0.70.31910.50.2128N/A20CR05

0.0262

0.0559

0.0611

0.0262

0.0559

0.0349

P

Evaluating levels

0.650.27660.150.0638N/A20CR04

0.550.25530.450.1915N/A20CR03

0.450.21280.450.1915N/A20CR02

0.61900.64290.27660.57140.17022114Epinions (B)

0.42420.41180.21280.23530.08516217CNet

0.30590.53330.17020.20.06388515Bizrate

0.30770.66670.12770.50.0638376Megapixel

0.24140.38460.21280.07690.04269026DPReview

0.37930.60.06380.60.0638295Epinions (A)

RR+P+RP

Evaluating product attributesTotal # attributes + levels

Total # attributesReference

0.70.31910.50.2128N/A20CR05

0.0262

0.0559

0.0611

0.0262

0.0559

0.0349

P

Evaluating levels

0.650.27660.150.0638N/A20CR04

0.550.25530.450.1915N/A20CR03

0.450.21280.450.1915N/A20CR02

0.61900.64290.27660.57140.17022114Epinions (B)

0.42420.41180.21280.23530.08516217CNet

0.30590.53330.17020.20.06388515Bizrate

0.30770.66670.12770.50.0638376Megapixel

0.24140.38460.21280.07690.04269026DPReview

0.37930.60.06380.60.0638295Epinions (A)

RR+P+RP

Evaluating product attributesTotal # attributes + levels

Total # attributesReference

Table 4: Evaluating attributes and levels, K = 50

24

nologies evolve over time, we also considered all pairwise comparisons between the reference guides

themselves. Table 5 reports the average precision and recall from evaluating each external source against

every other one. The results suggest, at least in part, that the external sources are neither exhaustive nor

even consistent with one another. Hence, a more appropriate benchmark for evaluation may be the inter-

nal consistency between the sources themselves. Assuming containment, the 0.56 average recall from our

approach, labeled "Auto" in Table 5, exceeds all others. As one would then expect, our precision is corre-

spondingly lower. However, if we account for duplicate attributes among those noted in Table 2 above,

our results (the column labeled "Dup" in Table 5) become competitive with the reference guides without a

significant penalty in recall. As generating a "larger" list automatically can be further reduced by post-

time human intervention, such results are both encouraging and useful.

Most importantly, our features are induced directly from customer reviews. In some cases, our

product features may not closely align with those in the marketing materials for manufacturers and retail-

ers; but rather than being indicative of an unsuccessful process, the inconsistencies may also signal a

managerially significant disconnect between buyers and producers. Of equal significance is the language

used. While some of the terms may not articulate clean distinctions between levels, they do reflect the

subjective language with which customers converse about the product. This in itself is value to the field

as in this manner one can differentiate what managers care about and that stated by consumers. Thus in

total, we are able to reproduce many aspects of the buying guides and provide attributes and levels refer-

enced by consumers but were never represented.

6. Discussion and Conclusions

In this paper, we have presented a system for automatically processing the text from online cus-

tomer reviews. Phrases are parsed from the original text. The phrases are then normalized and clustered.

A novel logical assignment approach exploits the structure of review summaries to further separate

0.55

0.32

Dup

0.390.500.370.370.380.460.330.480.230.550.230.56Recall

0.390.370.270.260.240.480.270.410.570.330.690.21Precision

MeanCR05CR04CR03CR02E(B)CnetBizMegaDPE(A)Auto

0.55

0.32

Dup

0.390.500.370.370.380.460.330.480.230.550.230.56Recall

0.390.370.270.260.240.480.270.410.570.330.690.21Precision

MeanCR05CR04CR03CR02E(B)CnetBizMegaDPE(A)Auto

Table 5: Internal consistency: average precision and recall between one source and all others

25

phrases into sub-clusters and then assigns individual words to unique categories. The need to refine the

system is symptomatic of broader conceptual and pragmatic issues discussed briefly next.

6.1. Conceptual considerations

To generate initial clusters, the system requires phrases. Even though Epinions' customer reviews

provide lists of phrases, variations in human input are a source of noise. Moreover, we assume that a

phrase represents a single concept and that individual words represent distinct levels of attribute proper-

ties. We saw earlier how this assumption creates difficulties when managing feature lists (e.g. "digital

and optical zoom") as well as word sequences (e.g. "compact flash" vs. "compactflash").

One possible solution is to apply more sophisticated NLP techniques. For example, rather than

treating each word as a unique term, NLP chunking techniques group words into phrases (e.g. treat “digi-

tal zoom” as a single concept rather than as an attribute “zoom” and a dimension or level “digital”). A

second possibility is to substitute a different clustering technique. Recognizing that the phrase × word

matrix is characterized by relative sparsity and comparatively high dimensionality, an entirely different

approach that would preserve the unsupervised nature of our work is to attempt to identify phrases

through frequent item-set analysis. Hu and Liu (2004) demonstrate that by tuning support and confidence

thresholds, it is possible to discover whether a word pair represents one concept or two (e.g. “compact

flash” versus “3x zoom”).

Challenges such as word sequences or parsing errors also appear in the logic-program alignment

of words to attribute dimensions. Even accurate parsing does not preclude issues with clique generation

and word assignment. First, we make the very strong assumption that a maximal clique, taken over a

word graph, serves as an adequate proxy for the proper number of attribute dimensions. There is no guar-

antee that our heuristic process does not miss a larger maximal clique that more completely captures the

corresponding concept attributes or that our maximal clique does not contain spurious dimensions due to

initial parsing errors. As seen earlier, these assumptions can lead to erroneously omitting attribute dimen-

sions and levels.

26

Second, there are a number of challenges for our logic-based assignment of words to attribute

dimensions. Homonyms can link concepts inappropriately. Synonyms and misspellings can segregate

concepts incorrectly. Poor parsing of multi-word sequences create inappropriate constraints (e.g. separat-

ing ‘mega’ and ‘pixel’ versus ‘megapixel’). These word sequences are then manifested as multi-word

dimensions or levels. Likewise, insufficient constraints are akin to null values within a database table.

To improve the robustness of the alignment step, we are currently experimenting with a more tra-

ditional mixed integer programming formulation of the assignment problem. The introduction of a pen-

alty function may address the problem of conflicting constraints as well as address under-constrained

words. We are also attempting to identify additional sources of constraints. For example, incorporating

external data such as the associated manufacturer’s product description may prove extremely helpful.

Finally, we face the challenge of generating a semantically meaningful number of coherent levels

using distributional clustering. One perspective on the problem concerns using fewer than the user-

specified threshold of clusters. That a property (column) contains three words does not necessarily mean

that each word represents a unique level. Rather, we could hierarchically cluster the levels of every prop-

erty between one and a user-specified threshold (e.g. 6) and optimize the number of clusters based upon

cluster characteristics such as size or distribution.

Just as we might have fewer than a threshold number of levels, there could be too many. As ob-

served earlier, generating coherent clusters from a large number of levels is problematic if all levels of the

property being clustered share the same distribution over the residual attribute properties. One solution is

to draw upon domain knowledge to form clusters using different meaningful semantics, but that would

likely reduce/eliminate the unsupervised nature of our algorithm. A second alternative that might pre-

serve the domain independence of the technique is to draw in additional sources of data to force the dis-

tributions apart. Additional sources of data might include phrases that co-occur with the levels being

clustered or manufacturer details of the products being reviewed. Manufacturer details are typically also

accessible in conjunction with the product reviews themselves.

27

6.2. Pragmatic considerations.

We need the ability to assess the stability of our clusters and concomitant product features. One

instance of stability is sensitivity to data sample size. Here, we relied upon a large data set to yield the

phrases from which we identify a maximal clique. The large data set is also a boon because we can liber-

ally discard phrases to minimize the effects of naïve parsing. To measure the sensitivity to sample size,

we would cross-validate on smaller sets of review samples. We can plot the trade-off between sample

size and evaluation metrics to identify diminishing returns and attempt to estimate a minimal number of

required reviews. The issue of "when to construct the attribute and levels" for one's conjoint studies is an

important one. Care needs to be taken in ensuring sufficient heterogeneity in the sample selection with

respect to different product features and the corresponding feature attributes.

Finally, while our approach is generalizable across different product domains, our dependence on

sources that provide phrase-like strings is a limitation. At least two factors ameliorate this limitation.

First, there are other domains where phrase-like text-strings apply as opposed to prose. Progress notes in

medical records and online movie reviews (Eliashberg and Shugan 1997) are two such examples. Sec-

ond, recognizing the current limitations of natural language processing tools, more online sources are so-

liciting customer feedback in the form of phrases rather than prose to facilitate automated processing

(Google 2007).

6.3. Future work

In addition to work expanding the conceptual and pragmatic dimensions of our work, there are a

number of ways in which we might enrich the concept relationships that we are learning. For example,

we currently learn both product attributes and attribute properties. However, depending upon their de-

composition, some properties may be disjoint and others not. Most buying guides presented optical zoom

and digital zoom as distinct attributes with properties such as magnification. However, it is also not un-

common to see a single product attribute zoom with properties for both magnification and type. Where

"digital" and "optical" are both instances of the property zoom type, a single camera can take on multiple

vales of zoom type. By contrast, the levels of camera type, which include "slr," "standard," and "com-

28

pact," are mutually exclusive. From a marketing and recommendation perspective, it might prove useful

to extend our attributes and levels to distinguish between mutually exclusive property levels and those

that are not.

Memory capacity exhibits a second dimension of the relationship between attributes and proper-

ties. Some properties are ordinal in nature. Recognizing order facilitates the related task of aligning or-

derings. For marketing and product design, aligning is critical because different customers may address a

concept using parallel categories. For example, will 32 mb satisfy a customer seeking to store 130 im-

ages. Because of the relational assumption underlying our CLP approach, we can apply concept cluster-

ing (Gibson et al. 1998; Ganti et al. 1999) to group words from parallel categories.

There are sources of online customer reviews that provide Pro/Con review summaries other than

Epinions. We would like to integrate knowledge from multiple sources to augment the limited samples

from a single source. One motivation might be to extend traditional recommender systems with user-

driven, needs-based attributes based upon the language used by reviewers (Lee 2004; Adomavicius and

Tuzhilin 2005).

While there are many buying guides that provide recommendations for specific products or ser-

vices, most guides tend to rely upon domain-specific experts. Unfortunately, reliance upon experts is not

scalable. Automated support for managing customer and product data is necessitated by the heterogeneity

among both producers and users as well as the increasing complexity of products. A critical step in pro-

viding automated support lies in simply understanding the language used to describe a particular product

category. In this paper, we present an unsupervised, domain independent approach to learning the ontol-

ogy for specific product categories based upon consumer feedback in the form of online customer re-

views. Reviews are first pre-processed using shallow NLP techniques. The resulting phrases are normal-

ized and then clustered into product attributes by adapting traditional document clustering algorithms.

Further decomposition into attribute properties and levels is enabled by a novel bounds consistency ap-

proach to constraint logic programming; we treat the clustering as an assignment problem and exploit the

co-occurrence structure of Pro/Con review summaries. We applied the method to a set of several thou-

29

sand online reviews and evaluated the results against a collection of online buying guides. Interestingly,

though the automatically induced features do not perfectly align with the published guides, this is not

necessarily an indication of poor performance. Indeed even the different buying guides do not agree

among themselves. Because our features are drawn directly from customer comments, the differences

may reveal a significant opportunity for better managing the consumer, producer relationship. Moreover,

as products adapt over time, so must the conjoint analysis that accompanies it. We believe that our re-

search can be an important first step in that direction.

30

References Adomavicius, G. and A. Tuzhilin 2005. Towards the Next Generation of Recommender Systems: A Survey of the

State-of-the-art and Possible Extensions. IEEE Transactions of Knowledge and Data Engineering 17(6):

734-749.

Baker, D. and A. McCallum 1998. Distributional Clustering of Words for Text Classification. SIGIR 98.

Borgelt, C. and R. Kruse 2002. Induction of Association Rules: Apriori Implementation. 15th Conf on Computa-

tional Statistics (Compstat).

Bradlow, E., Y. Hu and T.-H. Ho 2004. A Learning-based Model for Imputing Missing Levels in Partial Conjoint

Profiles. Journal of Marketing Research 41(4): 369-381.

Cecchini, M. 2005. Quantifying the Risk of Financial Events Using Kernel Methods and Information Retrieval. De-

cision and Information Sciences, University of Florida. PhD.

Chen, Y. and J. Xie 2004. Online Consumer Review: A New Element of Marketing Communications Mix. Social

Science Research Network, http://ssrn.com/abstract=618782

Chevalier, J. and D. Mayzlin 2003. The Effect of Word of Mouth Online: Online Book Reviews. Working Paper,

Yale School of Management.

Dellarocas, C. 2003. The Digitization of Word of Mouth: Promises and Challenges of Online Feedback Mecha-

nisms. Management Science 49(10): 1401-1424.

Dhillon, I. S., S. Mallela and R. Kumar 2002. Enhanced Word Clustering for Hierarchical Text Classification.

Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Doan, A., P. Domingos and A. Halevy 2003. Learning to Match the Schemas of Databases: A Multistrategy Ap-

proach. Machine Learning Journal 50: 279-301.

Eliashberg, J. and S. Shugan 1997. Film Critics: Influencers or Predictors? Journal of Marketing 61: 68-78.

Evgeniou, T., C. Boussios and Z. Giorgos 2005. Generalized Robust Conjoint Estimation. Marketing Science 24(3):

415-429.

Ganti, V., J. Gehrke and R. Ramakrishnan 1999. CACTUS - Clustering categorial data using summaries. ACM

SIGKDD International Conference on Knowledge Discovery and Data Mining.

Ghose, A., P. Ipeirotis and A. Sundararajan 2006. The Dimensions of Reputation in Electronic Markets, New York

University: 32.

31

Gibson, D., J. Kleinberg and P. Raghavan 1998. Clustering categorical data: an approach based on dynamical sys-

tems. 24th International Conference on Very Large Databases (VLDB).

Goldberg, S. M., P. E. Green and Y. Wind 1984. Conjoint Analysis of Price Preimums for Hotel Amenities. Journal

of Business 57(1): 111-132.

Google 2007. About Google Base, Google

Green, P. E. and A. M. Krieger 1991. Segmenting Markets with Conjoint Analysis. Journal of Marketing 55: 20-31.

Green, P. E. and V. Srinivasan 1978. Conjoint Analysis in Cosumer Research: Issues and Outlook. Journal of Con-

sumer Research 5: 103-123.

Han, J. and Y. Fu 1994. Dynamic generation and refinement of concept hierarchies for knowledge discovery in da-

tabases. AAAI 94 Workshop on Knowledge Discovery in Databases (KDD94).

Hartmann, A. and H. Sattler 2002. Commercial Use of Conjoint Analysis in Germany, Austria, and Switzerland.

Research Papers on Marketing and Retailing, University of Hamburg: 14.

Hearst, M. 1992. Automatic acquisition of hyponyms from large text corpora. Fourteenth International Conference

on Computation Linguistics (COLING).

Hu, M. and B. Liu 2004. Mining and Summarizing Customer Reviews. KDD04.

Huber, J. and K. Zwerina 1996. The Importance of Utility Balance in Efficient Choice Designs. Journal of Market-

ing Research 33: 307-317.

Kilgarriff, A. 2001. Comparing Corpora. International Journal of Corpus Linguistics 6(1): 97-133.

Lee, L. 1999. Measures of Distributional Similarity. Association for Computational Linguistics (ACL 99).

Lee, T. 2004. Use-centric mining of customer reviews. Workshop on Information Technology and Systems (WITS).

Lee, T. 2005. Ontology Induction for Mining Experiential Knowledge from Customer Reviews. Utah Winter Infor-

mation Systems Conference.

Lehmann, D. R., S. Gupta and J. H. Steckel 1997. Marketing Research, Prentice Hall.

Liu, B., M. Hu and J. Cheng 2005. Opinion Observer: Analyzing and Comparing Opinons on the Web. WWW 2005.

Maedche, A. and S. Staab 2000. Semi-automatic Engineering of Ontologies from Text. Twelfth International Con-

ference on Software Engineering and Knowledge Engineering (SEKE'2000).

Michalek, J. J., F. M. Feinberg and P. Y. Papalambros 2005. Linking Marketing and Engineering Product Design

Decisions via Analytical Target Cascading. Journal of Product Innovation Management 22: 42-62.

32

Missikoff, M. and R. Navigli 2002. Integrated approach to Web ontology learning and engineering. IEEE Computer:

54-57.

Modica, G., A. Gal and H. Jamil 2001. The Use of Machine-Generated Ontologies in Dynamic Information Seeking.

CoopIS 2001.

Moore, W. L., J. Gray-Lee and J. J. Louviere 1998. A Cross-Validity Comparison of Conjont Analysis and Choice

Models at Different Levels of Aggregation. Marketing Letters 9(2): 195-207.

Moore, W. L., J. J. Louviere and R. Verma 1999. Using Conjoint Analysis to Help Design Product Platforms. Jour-

nal of Product Innovation Management 16: 27-39.

Nasukawa, T. and J. Yi 2003. Sentiment Analysis: Capturing Favorability Using Natural Language Processing. K-

CAP`03.

Netzer, O. and V. Srinivasan 2006. Adaptive Self-Explication of Multi-Attribute Preferences. Yale Center for Cus-

tomer Insights.

Pereira, F., N. Tishby and L. Lee 1993. Distributional Clustering of English Words. Association for Computational

Linguistics (ACL93).

Popescu, A.-M. and O. Etzioni 2005. Extracting Product Features and Opinions from Reviews. HLT-EMNLP.

Popescu, A.-M., A. Yates and O. Etzioni 2004. Class extraction from the World Wide Web. AAAI 2004 Workshop

on Adaptive Text Extraction and Mining (ATEM).

Salton, G. and M. McGill 1983. Introduction to modern information retrieval. New York, McGraw-Hill.

Suryanto, H. and P. Compton 2000. Learning classification taxonomies from a classification knowledge based sys-

tem. ECAI 2000 Workshop on Ontology Learning.

Tan, P.-N., M. Steinbach and V. Kumar 2006. Introduction to Data Mining. Boston, Pearson Education, Inc.

Toubia, O., D. I. Simester, J. R. Houser and E. Dahan 2003. Fast Polyhederal Adaptive Conjoint Estimation. Mar-

keting Science 22(3): 273-303.

Turney, P. 2002. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of

Reviews.

Van den Bulte, C. 2000. New Product Diffusion Acceleration: Measurement and Analysis. Marketing Science 19(4):

366-380.

33

Wittink, D. R. and P. Cattin 1989. Commercial Use of Conjoint Analysis: An Update. Journal of Marketing 53: 91-

96.

Wittink, D. R., L. Krishnamurthi and J. B. Nutter 1982. Comparing derived importance weights across attributes.

Journal of Consumer Research 10(1): 471-74.

Zaki, M. and M. Peters 2005. CLICKS: Mining Subspace Clusters in Categorical Data via K-partite Maximal

Cliques. 21st International Conference on Data Engineering (ICDE05).

Zhao, Y. and G. Karypis 2002. Criterion Functions for Document Clustering: Experiments and Analysis. University

of Minnesota Deptartment of Computer Science/Army HPC Research Center. Minneapolis, MN, University

of Minnesota: 30.

34

Appendix 1. Glossary of Terms

Bounds consistency approach: A class of solution strategies for solving problems in constrained

optimization. The classic bounds consistency approach interleaves backtracking search

with constraint propagation.

Classifier system: A class of computational algorithms for labeling input instances.

Clique: A subset of the set of nodes in a graph such that all nodes in the sub-graph are mutually

adjacent to one another.

Complete graph: A graph where all nodes are mutually adjacent to one another (i.e. where all

nodes are connected to one another by an edge).

Generic reference ontology: An ontology that contains general terms and relationships common

to many domains of knowledge.

Graph: A set of nodes (also called vertices) and a set of edges. Edges are expressed as node

pairs and define a line between the two constituent nodes.

K-partite: A graph that is decomposed into K disjoint sets of subnodes where no two nodes

within the same set are adjacent (i.e. there are no edges between nodes within the same

partition).

Maximal clique: A clique is maximal if its set of nodes is not a subset of any clique containing

an additional node.

Maximum clique: A maximal clique for which there exists no larger clique. The task of discov-

ering the maximum clique in a graph is NP complete.

Ontology: A structured vocabulary that captures the terms used in a particular domain as well as

a set of relationships that hold between those words. Common relationships captured

35

within an ontology include: Hyponym (A is a hyponym of B if A is a kind of B), Mero-

nym (A is a meronym of B if A is a part-of B).

Ontology induction: The process of learning an ontology (generally associated with automated

methods for learning an ontology).

Sentiment analysis: A sub-field of natural language processing, content analysis, and computa-

tional linguistics that analyzes the emotional tenor or sentiment of a text passage. For ex-

ample, is the writer happy, sad, pleased, angry, etc.

Stop-word lists: Commonly used in text processing, a pre-defined list of words that typically

convey little or no semantic information and so are automatically discarded in text proc-

essing. These include articles and preopositions such as: "The, A, Of, From, To …"

Supervised learning: A class of machine learning approaches that learn a task by first being

trained on a training set of representative inputs where the answers are known a'priori.

The learning classifier learns or generalizes from the training instances in a systematic

way.

Supervised learning classifier systems: Classifier systems that learn based upon a pre-labeled set

of training instances.

Weighted graph: A graph with weights associated either on the edges (an edge-weighted graph)

or on the nodes (a node-weighted graph).

36

Appendix 2. Algorithmic Details

In this Appendix, we elaborate on specific details to the algorithmic process described in Sections 3 and

4.

[1] Vector space model and word importance

Borrowing from the information retrieval community, our phrase × word matrix is a representa-

tion of the vector-space model (VSM). More formally, j ∈ J is a word in the set of all words; i ∈ I is a

phrase. A phrase is simply a finite sequence of words and J is a subset of the set of finite word sequences

I = {<j>| j ∈ J}. We define an initial phrase × word matrix as a simple variation on the term-frequency

inverse-document-frequency (TF-IDF) VSM (Salton and McGill 1983):

Matrix(i,j) = (TFij × IPFj) (2.1)

where the term frequency ( )ijTF counts the total number of occurrences of word j in the instances of

phrase i. The inverse phrase frequency IPFj = log(|I|/nj) is a weighting factor for words that are more

helpful in distinguishing between different product attributes because they only appear in a fraction of the

total number of unique phrases. If |I| represents the total number of unique phrases in the review collec-

tion, nj counts the total number of unique phrases containing word j.

A limitation of the TF-IPF weighting is that there are still some terms (e.g. sentiment words like

"great" or "good") that are neither stop words nor product attributes yet appear with product attributes in

the TF-IDF matrix. As an additional discount factor beyond IPF, we automatically gather words from a

second set of K phrases using online reviews for an unrelated product domain. Intuitively, words appear-

ing in the reviews for unrelated products are less likely to represent relevant product attributes for the fo-

cal one. For example, words describing digital camera attributes are less likely to also appear in vacuum

cleaner reviews.

37

Formally, for a set of (I') phrases drawn from the set of finite word sequences over j ∈ J, we cal-

culate rank(j) = rank(TF'ij×IPF'j) where higher weighted frequencies correspond to higher rank. Note that

multiple words may share the same rank; if we define words that do not appear in any phrase as having

IPF'j = 0, then we may say:

Matrix(i,j) = ( ) ( )jjij IPFIPFjrankTF '−×× (1a)

Thus, we scale TF by the rank of the word in the unrelated product domain and scale the IPF by IPF'

from the unrelated product domain.

[2] Graph representations

To transform clusters of Pro/Con review phrases into individual tables, we model each set of

phrases as a graph. Every word is a node in the graph and every edge labels the co-occurrence of two

words within the same phrase. The graph then capture the assumption that no two words in a phrase refer

to the same attribute dimension. In the same way, we could generate a graph where every phrase is a

node and every edge between two phrases indicates the co-occurrence of two phrases within the same

review. This captures the parallel assumption that no two phrases in a review refer to the same product

attribute. We revisit this parallel assumption when we discuss the filtering of phrase clusters correspond-

ing to a single product attribute.

More formally, we assume that phrases and words are preprocessed and normalized into words as

before. A graph G = (V,E) is a pair of the set of vertices V and the set of edges E. An edge in E is a con-

nection between two vertices and may be represented as a pair (vi,vj) ∈ V. Each phrase (word) represents

a vertex v in the graph; edges are defined by phrase pairs within a review (word pairs within a phrase).

An N-partite graph is a connected graph where there are no edges in any set of vertices Vi. A clique of

size N simulates a schema and can be extended to an N-partite graph by substituting each vertice vi of the

clique with a set of vertices Vi. A database table with disjoint columns thus represents an N-partite graph.

A maximal-complete-N-partite graph is a complete-N-partite graph not contained in any other such

38

graph; in other words, the initial clique is maximal. The corresponding database table of phrases repre-

sents the existing product attribute space, and the maximal-complete-N-partite graph includes possibly

novel combinations of previously unpaired attributes and/or attribute properties.

To relate the graph back to customer reviews, we say that a product attribute is constructed from k

dimensions. Each dimension names a domain (D). Each domain D is defined by a finite set of words that

includes the value NULL for review phrases where customers fail to mention one or more attribute di-

mension(s). The Cartesian product of domains D1 …Dk is the set of all k-tuples {t1…tk | ti ∈ Di}. Each

phrase is simply one such k-tuple and the set of all phrases in the cluster simply defines a finite subset of

the Cartesian product. A relational schema is simply a mapping of attribute properties A1 …Ak to domains

D1 … Dk. Note the strong, implicit assumption that a maximal clique, taken over a word graph, is a proxy

for the proper number of attribute dimensions. Under this assumption, it is easy to see how searching for

cliques within the graph results in a table.

[3] Distributional clustering

In distributional clustering, the values of one attribute property are characterized by the joint dis-

tribution over the remaining attribute properties. From the example in Figure 4, levels of memory capac-

ity (e.g. 4, 8, 16) are defined by their joint distribution over form factor and whether the memory is in-

cluded or not. Intuitively, this suggests that certain memory types (e.g. compact flash, smart media, xD)

are generally used with certain memory capacities and not others, as would be common with real prod-

ucts; this enables collapsing.

More precisely, recall that every product attribute is described as a table where table columns rep-

resent properties. We assume that all attribute properties are defined over discrete, categorical domains.

For each column of the table, every unique value is initialized to a distinct level. Levels are defined in

terms of the joint probability space over all other columns in the table. We construct the joint probability

density function (PDF) for each level from the table rows. Each PDF is represented as a sparse vector. If

39

there are more than a user-specified number of levels in a column, levels are hierarchically clustered

based upon the COS similarity of their distributions (Lee 1999).

The examples also illustrate some inherent limitations of distributional clustering applied to this

context. First, the approach is sensitive to relative semantics. In Table 1, levels of optical zoom magnifi-

cation (e.g. 3x, 6x) are defined by their joint distribution over descriptive adjectives like "standard" versus

"long." Thus, all magnifications described as "long" would be clustered together. However, while some

users might consider 3x magnification "standard" today, as technology evolves or depending upon need,

others might describe 3x magnification as "poor." Second, as with the CLP step, distributional clustering

relies upon a sufficiently large, representative sample. The limited sample of phrases in Figure 4 would

treat "4" and "16" as a cluster of memory capacity separate from "8."

[4] Constrained Logic Programming

To align words into their corresponding attribute dimensions, we frame the task as a mathemati-

cal assignment optimization and resolve the problem using a bounds consistency approach. We define the

assignment using the maximal clique that corresponds to the schema for each product attribute table (see

Figure 7). In the bounds consistency approach, we invert the constraints (tok_exclusion) to express the

complementary set of candidate assignments (tok_candidates) for each attribute dimension. If the phrase

constraints, taken together, are internally consistent, then the candidate assignments (tok_assign)for a

given token are simply the intersection of all candidate assignments as defined by all phrases in the clus-

ter containing that token.

process_phrases(p_list) [1] schema = find_maximal_clique(p_list) [2] order phrases by length [3] for each phrase p: [4] # initialize data structures [5] tok_exclusion – for each tok, mutually exclusive tokens [6] tok_candidates – for each tok, valid candidate assignments [7] tok_assign – for each tok, the dimension assignment [8] # propagate the constraints for each successive phrase [9] tok_candidates, tok_exclusion, tok_assign = [10] propagate_bounds(phrase, tok_candidates, [11] tok_exclusion, tok_assign, schema)

Figure 7. Logical Assignment

40

We transform the mutual exclusivity constraint represented by each phrase into a set of candidate

assignments using the algorithm in Figure 8. Note that we need only propagate the mutual exclusivity of

words that are previously unassigned. Accordingly, for each unassigned token in a given phrase, the set

of candidate assignments is the intersection of the possible assignments based upon the current phrase and

all candidate assignments from earlier phrases containing the same token. We maintain a list of active

tokens boundary_list to avoid rescanning the set of all tokens every time the possible assignments for a

given token is updated.

Finally, the k-means clustering used to separate review phrases into distinct product attributes is a

noisy process. The clustering can easily result in the inclusion of spurious phrases. By modeling reviews

as a graph of phrases, we can apply the same CLP in a pre-assignment step to filter a single (noisy) cluster

of phrases. As alluded to in Appendix 2.2, we generate a graph where phrases are nodes, and edges rep-

resent the co-occurrence of two phrases within the same review. The same assignment representation

removes phrases that are not central to the product attribute at the heart of a particular phrase cluster.

[5] Statistical filtering

As noted in Section 4, the clusters that result from the CLP are not necessarily clean. To clean

the resulting tables of product attributes and dimensions, we apply a two-stage statistical filter. First, be-

cause each table itself separates tokens into attribute properties (columns), meaningful tables will not hold

too small a percentage of the overall number of tokens. Second, we assume that meaningful tables com-

prise a (predominately) disjoint token subset. If the tokens in a table appear in no other table, then the

intra-table token frequency should match the frequency of the initial k-means cluster; likewise, the table's

propagate_bounds(phrase, tok_candidates, tok_exclusion, tok_assign, schema)

[1] # marshall prior assignments [2] unassigned_tok = {t|t∈phrase ∧ t∉assign_d} [3] unassigned_attr = {a|a∈schema ∧ ∀t(t∉phrase ∨ a∉tok_assign[t])} [4] for each t in unassigned_tok:

[5] tok_exclusion[t] = (t × (unassigned_tok – t))⋃ tok_exclusion[t] [6] possible_assign = {a|a∈(unassigned_attr ⋂ tok_candidates[t])} [7] boundary_list = {(t,[possible_assign])} ⋃ boundary_list [8] recurse_boundary(boundary_list, tok_exclusion, tok_assign)

Figure 8. Propagate boundary constraints

41

token order should match the relative order of the initial cluster. The first stage of our statistical filter is

evaluation of a χ2 statistic, comparing each table to its corresponding initial cluster. Although there is no

hypothesis to be tested per se, there is a history of applying the χ2 statistic in linguistics research to com-

pare different sets of text with a measure that weights higher-frequency tokens with greater significance

than lower frequency tokens (Kilgarriff 2001). In our case, we set a minimum threshold on the χ2 statistic

to ensure that individual tables reflect an appropriate percentage of tokens from the initial cluster.

After filtering out tables that do not satisfy the χ2 threshold, we use the same cluster token counts

to calculate rank order statistics. We compare the token rank order from each constituent table to that in

the corresponding initial cluster using a modified Spearman rank correlation co-efficient (rs). As a minor

extension, we use the relative token rank, meaning that we maintain order but keep only tokens that are in

both the initial and the iterated CLP cluster(s). We select as significant those tables that maximize rs.

Automatic Construction of Conjoint Attributes and Levels from Online Customer … · 2015-07-28 · Automatic Construction of Conjoint Attributes and Levels from Online Customer Reviews

Documents