Top Banner
AN IN-DEPTH ANALYSIS OF TAGS AND CONTROLLED METADATA FOR BOOK SEARCH TOINE BOGERS VIVIEN PETRAS MARCH 23, 2017 iCONFERENCE 2017
27

An In-depth Analysis of Tags and Controlled Metadata for Book Search

Apr 16, 2017

Download

Science

Toine Bogers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An In-depth Analysis of Tags and Controlled Metadata for Book Search

AN IN-DEPTH ANALYSIS OF TAGS AND CONTROLLED METADATA FOR BOOK SEARCH

T OINE BOGERSVIV IEN P ET RAS

MARCH 23, 2017iCONFERENCE 2017

Page 2: An In-depth Analysis of Tags and Controlled Metadata for Book Search

OUTLINE

▸ Introduction

▸ Methodology & Experimental Setup

▸ Analysis

– Tags vs. Controlled Vocabularies

– Book Search Requests

– Failure Analysis

▸ Conclusions & Future Work

2

Page 3: An In-depth Analysis of Tags and Controlled Metadata for Book Search

INTRODUCTION

Page 4: An In-depth Analysis of Tags and Controlled Metadata for Book Search

MOTIVATION

▸ Readers often struggle with existing systems (i.e., library catalogs, Amazon, eBook sellers) to discover new books

– Information needs are contextual, personal & complex

– Book metadata does not contain the necessary information

4

Page 5: An In-depth Analysis of Tags and Controlled Metadata for Book Search

EARLIER WORK

▸ iConference 2015

– Tags outperform controlled vocabularies for search, but sometimes controlled vocabularies are better.

– Controlled vocabularies contains more unique terms, tags more repetition of terms.

▸ Why?

– Terminology

– Popularity / frequency

– Type of request

5

Page 6: An In-depth Analysis of Tags and Controlled Metadata for Book Search

STUDY OBJECTIVES

▸ Why are tags better than controlled vocabularies for book search?

– Which types of book search requests are better addressed using tags and which using CV?

– Which book search requests fail completely and what characterizes such requests?

6

Page 7: An In-depth Analysis of Tags and Controlled Metadata for Book Search

METHODOLOGY & EXPERIMENTAL SETUP

Page 8: An In-depth Analysis of Tags and Controlled Metadata for Book Search

EXPERIMENTAL SETUP

▸ Controlled Vocabulary content (CV)

– DDC class labels

– Subjects

– Geographic names

– Category labels

– LCSH terms

▸ Tags

– Each tag occurs as many times as it has been assigned bythe users

▸ Unique tags

– Each tag occurs only once

8

Page 9: An In-depth Analysis of Tags and Controlled Metadata for Book Search

AMAZON/LIBRARYTHING COLLECTION 9

TagsTags

Controlled Vocabulary Content (CV)

DDC class labelssubjectsgeographic namescategory labels

LCSH terms

Unique TagsUnique Tags per record

Page 10: An In-depth Analysis of Tags and Controlled Metadata for Book Search

ANNOTATED LT TOPIC

10

Recommended books

Topic title

Narrative

Page 11: An In-depth Analysis of Tags and Controlled Metadata for Book Search

EXPERIMENTAL SETUP

▸ Amazon / LibraryThing collection of book records

– 2 million records

▸ LibraryThing forum topics for search requests

– 334 search requests for testing

▸ Relevance judgements

– Recommendations from LT members with graded relevance scoring(highest relevance if book is added by searcher)

▸ Evaluation metric

– Normalized Discounted Cumulated Gain (NDCG@10)

▸ IR system

– Indri 5.4 toolkit

10

Page 12: An In-depth Analysis of Tags and Controlled Metadata for Book Search

ANALYSIS

Page 13: An In-depth Analysis of Tags and Controlled Metadata for Book Search

TAGS vs. CONTROLLED VOCABULARIES

▸ Question 1: Is there a difference in performance between CV and Tags in retrieval?

▸ Answer

– Tags perform significantlybetter than CV

– The combination of both results in even better performance than just fortags, but not significantly so

– Losing tag frequencyinformation helps rather thanhurts performance (also notsignificantly)

12

Page 14: An In-depth Analysis of Tags and Controlled Metadata for Book Search

TAGS vs. CONTROLLED VOCABULARIES

▸ Question 2: Do tags outperform CV because of the so-called popularity effect?

▸ Answer

– No, there does not seem to be a popularity effect

– Types = unique words in a record

– Tokens = all instances of words in a record

13

Page 15: An In-depth Analysis of Tags and Controlled Metadata for Book Search

TAGS vs. CONTROLLED VOCABULARIES

▸ Question 3: Do Tags and CV complement or cancel each other out?

▸ Answer

– Tags and CV complement each other: they are successful on different sets of requests

– But most zero-difference requests (74.0%) actually fail completely!When and why?

14

Page 16: An In-depth Analysis of Tags and Controlled Metadata for Book Search

REQUESTS – RELEVANCE ASPECTS

▸ What makes a suggested book relevant to the user?

– Distinguish between eight relevance aspects (Reuter, 2007; Koolen et al., 2015)

16

Page 17: An In-depth Analysis of Tags and Controlled Metadata for Book Search

REQUESTS – RELEVANCE ASPECTS

Aspect Description % of requests(N = 87)

Accessibility Language, length, or level of difficulty of a book 9.2 %

Content Topic, plot, genre, style, or comprehensiveness 79.3 %

Engagement Fit a certain mood or interest, are considered high quality, or provide a certain reading experience 25.3 %

Familiarity Similar to known books or related to a previous experience 47.1 %

Known-item The user is trying to identify a known book, but cannot remember the metadata that would locate it 12.6 %

Metadata With a certain title or by a certain author or publisher, in a particular format, or certain year 23.0 %

Novelty Unusual or quirky, or containing novel content 3.4 %

Socio-cultural Related to the user's socio-cultural background or values; popular or obscure 13.8 %

16

Page 18: An In-depth Analysis of Tags and Controlled Metadata for Book Search

REQUESTS – RELEVANCE ASPECTS

▸ Question 4: What types of book requests are best served by the Unique tags and CV collections?

▸ Answer

– CV terms show a tendency to work best for requests that touch upon aspects of engagement

– Other requests are best served by Unique tags

17

Page 19: An In-depth Analysis of Tags and Controlled Metadata for Book Search

REQUESTS – RELEVANCE ASPECTS

iConference ���� [Petras & Bogers]

Relevanceaspect

Description Requestsoverall

UniqueTags> CV

CV > Uni-queTags

(N = ��) (N = ��) (N = ��)

Accessibility Language, length, or level of di�culty of a book �.�% �.�% ��.�%Content Topic, plot, genre, style, or comprehensiveness ��.�% ��.�% ��.�%Engagement Fit a certain mood or interest, are considered high

quality, or provide a certain reading experience��.�% ��.�% ��.�%

Familiarity Similar to known books or related to a previousexperience

��.�% ��.�% ��.�%

Known-item

The user is trying to identify a known book, butcannot remember the metadata that would locate it

��.�% ��.�% �.�%

Metadata With a certain title or by a certain author or pub-lisher, in a particular format, or certain year

��.�% ��.�% ��.�%

Novelty Unusual or quirky, or containing novel content �.�% �.�% �%Socio-cultural

Related to the user’s socio-cultural background orvalues; popular or obscure

��.�% ��.�% �.�%

Table �: Distribution of the relevance aspects over all �� successful book requests (column �), the requests where Unique

tags outperform CV terms by ���% or more (column �), and the requests where CV terms outperform Unique tags by ���%or more (column �). More than one aspect can apply to a single book request, so numbers to not add up to ���%.

0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00

Socio-cultural (N = 10)

Novelty (N = 2)

Metadata (N = 17)

Known-item (N = 11)

Familiarity (N = 36)

Engagement (N = 21)

Content (N = 63)

Accessibility (N = 7)

Unique tagsCV

0.0 0.20.1 0.40.3 0.60.5 0.80.7 1.00.9

Socio-cultural(N = 10)

0.11270.0428

Novelty(N = 2)

0.53040.0000

Metadata(N = 17)

0.24540.1259

Known-item(N = 11)

0.35930.1818

Familiarity(N = 36)

0.18330.0701

Engagement(N = 21)

0.11210.1425

Content(N = 63)

0.19650.0821

Accessibility(N = 7)

0.12350.0749

Performance grouped by relevance aspect

NDCG@10

Figure �: Results for the Unique tags and CV test collections, grouped by the eight relevance aspects expressed in the ��successful book search requests. Average NDCG@�� scores over all requests expressing a particular relevance aspect areshown in grey and as horizontal bars, with error bars in black.

18

Page 20: An In-depth Analysis of Tags and Controlled Metadata for Book Search

REQUESTS – TYPE OF BOOK

▸ Question 5: What types of book requests (fiction or non-fiction) are best served by Unique tags or CV?

▸ Answer

– Unique tags work significantly better for fiction

– CV work better for non-fiction (but not significantly so)

19

Page 21: An In-depth Analysis of Tags and Controlled Metadata for Book Search

FAILURE ANALYSIS

▸ Question 6: Do failed book search requests fail because of data sparsity, a lower recall base, or a lack of examples?

▸ Answer

– Neither sparsity nor the size of the recall base are the reason for retrieval failure

– The number of examples provided by the requester has significant positive influence on performance

(N = 247)(N = 87)

(N = 334)

20

Page 22: An In-depth Analysis of Tags and Controlled Metadata for Book Search

FAILURE ANALYSIS

▸ Question 7: Do book search requests fail because of their relevance aspects?

▸ Answer

– No, relevance aspects are distributed equally for successful & failed requests

– Only Accessibility-and Metadata-related search requests seem to fail more often

21

Page 23: An In-depth Analysis of Tags and Controlled Metadata for Book Search

FAILURE ANALYSIS

▸ Question 8: Does the type of book that is being requested (fiction vs. non-fiction) have an influence on whether requests succeed or fail?

▸ Answer

– Requests for works of fiction fail significantly more often

22

Page 24: An In-depth Analysis of Tags and Controlled Metadata for Book Search

CONCLUSIONS &FUTURE WORK

Page 25: An In-depth Analysis of Tags and Controlled Metadata for Book Search

FINDINGS

▸ Tags outperform CV...

– ...probably because their terminology is closer to the user‘slanguage (not because of the popularity effect)

▸ Sometimes CV are better, for example, for non-fiction books...

– ...whereas tags are better for fiction and for content-related, familiarity or known-item searches

▸ We believe that tags are simply better able to match the user‘slanguage when looking for books

– Although they are still not that great at it!

– Book search is still hard, especially for fiction books

25

Page 26: An In-depth Analysis of Tags and Controlled Metadata for Book Search

OPEN QUESTIONS

▸ How can book metadata be adapted to be closer to thevocabulary used in real-world book search requests?

▸ What other aspects (besides type of requested book orrelevance aspect of search request) contribute to requestdifficulty?

▸ Our question to you:

– What other questions can we ask of this data?

26

Page 27: An In-depth Analysis of Tags and Controlled Metadata for Book Search

QUESTIONS?

Paper URL: http://bit.ly/iconf2017