Top Banner
HARVESTING AND SUMMARIZING USER GENERATED CONTENT FOR ADVANCED SPEECH BASED HCI S. APARNA M-Tech CIS Roll no: 13
51
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

HARVESTING AND SUMMARIZING

USER GENERATED CONTENT

FOR ADVANCED SPEECH BASED

HCI

S. APARNA

M-Tech CIS

Roll no: 13

Page 2: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

ABSTRACT

Speech-based interface to aggregate user-generated content and present summarized information via speech-based human-computer interactions.

Two challenges :-

- to interpret the semantics and sentiment of data

- to develop a dialogue modeling mechanism.

We introduce

- a parse-and-paraphrase paradigm

- a sentiment scoring mechanism

- sentiment-involved opinion summarization

- dialogue modeling approaches

Page 3: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

CONTENT

INTRODUCTION

PROBLEM FORMULATION

1. DIALOGUE-ORIENTED UNSTRUCTURED DATA PROCESSING.

2. LINEAR ADDITIVE MODEL FOR SENTIMENT

DEGREE SCORING.

3. PHRASE CLASSIFICATION AND OPINION SUMMARY GENERATION

4. DIALOGUE MODELING

CONCLUSION

Page 4: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

1. INTRODUCTION

Web has been exploding with user-generated-

content (UGC).

To help users, information representation and

interface for content access need to be

improved.

Introducing condensed information

representation method and

a virtual assistant.

Eg:- A restaurant-domain prototype system has been

implemented.

Page 5: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 6: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

This system understands user request and

finds the target restaurants .

-summarized multiple retrieved entries in a

natural sentence.

-summarized the reviews on each

restaurant and made recommendations.

This work aims

-a conversational system to harvest UGC

-present them with natural dialogue

interaction.

Page 7: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

2. PROBLEM

FORMULATION

Filter out context-irrelevant information out of UGC and to present an informative summary.

In text-based system, information can be obtained by scanning the text.

With spoken dialogue systems, the information space is very limited.

Two challenges:

1) to equip a standard dialogue system to extract context-relevant information and summarize it into an aggregated form.

2) to present condensed information as sophisticated dialogues.

Page 8: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 9: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

3. DIALOGUE-ORIENTED

UNSTRUCTURED DATA

PROCESSING An information aggregation system should

form a condensed information representation

Utilize it as a knowledge base for multimodal

data access services.

An example of user-generated content is

shown

Page 10: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 11: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

A possible representation format is shown in

Table I

Page 12: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

Firstly, the representative phrases have to be

identified and extracted.

Secondly, the sentiment in these extracted

opinion-related phrases, on a numerical scale,

to calculate aspect ratings.

Thirdly, to generate a condensed summary.

An advanced dialogue modeling mechanism is

used to represent the information in natural

sentences.

Page 13: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

Figure shows the pipeline of the process.

Page 14: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

UGC will be subjected to linguistic parser for context-relevant phrase extraction.

A cumulative offset model estimate the sentiment degrees of the extracted expressions .

A classification model select high-quality phrases for further topic clustering and aspect rating .

A summary database is accessed by the dialogue system.

A sentiment-support dialogue modeling mechanism, generate recommendation-like conversations.

Page 15: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

3.1 PARSE AND PARAPHRASE

PRADIGM FOR PHRASE

EXTRACTION.

extracting opinion-relevant phrases from UGC.

a parse-and-paraphrase paradigm , extract

adverb-adjective-noun phrases from

unstructured documents

parse sentences into a hierarchical

representation known as linguistic frame.

An example linguistic frame is shown:

encodes parsing results of the sentence “The

caesar with salmon or chicken is really quite

good.”

Page 16: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 17: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

3.2 LINEAR ADDITIVE MODEL FOR

SENTIMENT DEGREE SCORING

To estimate a numerical sentiment degree for each expression on the phrase level.

Sentiment scoring make use of community users ratings.

By associating the rating with review texts ,we can associate numerical scores with textual sentiment.

When calculating the sentiment score, we consider adverbs and adjectives separately.

Fig. 5. Illustration of generating the sentiment scale for adjectives from original reviews and ratings published by different users.

Page 18: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 19: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

3.3 PHRASE CLASSIFICATION AND

OPINION SUMMARY GENERATION

The next step is to choose the most

representative phrases to generate an opinion

summary database.

The task of phrase selection can be defined as

a classification problem

Page 20: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

Where y is the label of a phrase

assigned a value ‘1’,if the phrase is highly informative and relevant, and

‘ -1’ if the phrase is uninformative.

x is the feature vector extracted from

0 is the coefficient vector.

Classification models such as decision trees can be trained to classify high/low informative phrases.

we extract a set of features for model training.

These features are treated as xi.

Page 21: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

sentiment score of each phrase generated by

the cumulative offset model is a sentiment

feature.

To capture the semantic importance, cluster

the topics of phrases into generic semantic

categories.

Table II gives some topic clustering examples.

For example, in the restaurant domain the

category of “food” contains various topics from

generic sub-categories .

Page 22: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 23: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

This sequence of phrase classification, topic

categorization , results in a summary

database.

An example database entry is exemplified

TABLE III

which contains lists of descriptive phrases in

major aspects (“Atmosphere,” “Food,”

“Service,” “Specialty,” and “General”)

as well as ratings (e.g., “Atmosphere_rating,”

“Food_rating,” “Service_rating,” and

“General_rating”).

Page 24: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 25: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

3.4 DIALOGUE MODELING

To make the system, present the highlighted

information to users via interactive

conversations.

An adaptive dialogue modeling mechanism

driven by the UGC summary database is

required .

Users’ feature-specific queries can be handled

well with keyword search.

For high-level qualitative questions, we make

use of sentiment scores to convert the

qualitative queries into measurable values.

Page 26: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

The numerical sentiment values can be used to search the database on aspect ratings.

Fig. 7 shows an exemplified procedure of handling qualitative queries.

When a user’s utterance is submitted to the system and

passed through speech recognition, a linguistic parser parses the sentence into a linguistic frame,

from which a set of key-value pairs is extracted as a meaning representation of the utterance.

Page 27: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 28: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

CONCLUSION

In this work, we have explored a universal framework that

supports to user-generated content,

with a speech-navigated web-based interface and

a generalized platform for unstructured data processing.

The contribution of this work lies in that it advances the integration of unstructured data summarization and speech-based human-computer interaction.

With the help of such dialogue systems, users can access the online community-edited information more effectively and more efficiently.

Page 29: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

We presented a framework for preprocessing unstructured UGC data.

We proposed a parse-and-paraphrase approach to extracting representative phrases from sentences,

Introducing an algorithm for assessing the degree of sentiment in opinion expressions based on user-provided ratings.

We also used a phrase classification model to select context-relevant phrases automatically

To present the summarized information in natural responses, a dialogue-modeling framework was also introduced.

Page 30: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

REFERANCES

S. R. K. Branavan, H. Chen, J. Eisenstein,

and R. Barzilay, “Learning document-level

semantic properties from free-text

annotations,”

J. Liu and S. Seneff, “Review sentiment

scoring via a parse-and-paraphrase

paradigm,”

J. Liu, “Harvesting and summarizing user-

generated content for advanced speech-based

human-computer interaction,”

Page 31: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012

THANK YOU

Page 32: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 33: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 34: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 35: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 36: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 37: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 38: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 39: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 40: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 41: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 42: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 43: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 44: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 45: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 46: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 47: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 48: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 49: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 50: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012
Page 51: HARVESTING  AND SUMMARIZING USER GENERATED  CONTENT FOR  ADVANCED  SPEECH  BASED HCI, IEEE 2012