Top Banner
The “nomenclature of multidimensionality” in the digital libraries evaluation domain Leonidas Papachristopoulos 1,2 , Giannis Tsakonas 3 , Michalis Sfakakis 1 , Nikos Kleidis 4 , and Christos Papatheodorou 1,2 1 Dept. of Archives, Library Science and Museology, Ionian University, Corfu, Greece 2 Digital Curation Unit, Institute for the Management of Information Systems, ‘Athena’ Research Centre, Athens, Greece 3 Library and Information Center University of Patras, Patras, Greece 4 Dept. of Informatics, Athens University of Economics and Business, Greece
17

The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Feb 16, 2017

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

The “nomenclature of multidimensionality” in the digital libraries evaluation domain

Leonidas Papachristopoulos1,2, Giannis Tsakonas3, Michalis Sfakakis1, Nikos Kleidis4, and Christos Papatheodorou1,2

1 Dept. of Archives, Library Science and Museology, Ionian University, Corfu, Greece

2 Digital Curation Unit, Institute for the Management of Information Systems, ‘Athena’ Research Centre, Athens, Greece

3 Library and Information Center University of Patras, Patras, Greece

4 Dept. of Informatics, Athens University of Economics and Business, Greece

Page 2: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

“nomenclature”a system for naming things,

especially in a particular area of science

/ 2 /

Page 3: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Introduction / aim / scope

1. We aimed to detect important topics and key persons of the Digital Library evaluation domain by applying the Latent Dirichlet Allocation (LDA) modelling technique on a corpus of conference papers:

• Source: JCDL, ECDL/TPDL & ICADL

• Period: 2001–2013

• Topics: 13 topics

2. We used network analysis centrality metrics to gain awareness of the relationships between these topics.

/ 3 /

Page 4: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Research questions

1. What is the importance of these topics?

1a Which are the most prominent topics emerged in DL evaluation?

1b How they interact each other?

2. Which are the most important research groups or individuals in the DL evaluation domain?

3. How ‘multidimensional’ is the behavior of the researchers in the field?

/ 4 /

Page 5: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Selection stage

• 395 papers (both full and short) from a pool of 2001 were classified as DL evaluation papers by a Naïve Bayes classifier.

• The classifier was assessed by three domain experts, having achieved a high inter-raters’ agreement score.

/ 5 /

Page 6: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Topic extraction stage

• The documents were converted to text.

• The texts were tokenized to construct a ‘bag of words’.

• The ‘bag of words’ was crosschecked to exclude stop words and remove all frequent (>2,100) and rare words (<5).

• A vocabulary of 38,298 unique terms and 742,224 tokens was formed.

• Each paper contributes on average 1,879 tokens

/ 6 /

Page 7: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Topic modelling stage 1/2

• Topic modeling analyzes large quantities of unlabeled data.

• A topic is a probability distribution over a collection of words.

• Each document is a random composition of a number of topics.

/ 7 /

Page 8: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Topic modelling stage 2/2

• Our texts were imported to Mimno’s jsLDA (javascriptLDA) tool.

• 1,000 training iterations were run to achieve a stable structure of topics.

• Several tests were executed to specify the optimal interpretable number of topics.

• Three domain experts examined the word structure of each topic.

• The optimal interpretable number of topics was found to be thirteen (13).

/ 8 /

Page 9: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Topics correlation

• jsLDA offers a topic correlation functionality based on the Pointwise Mutual Information (PMI) indicator.

• PMI compares the probability of two topics co-occurring in a document with the independent existence of each one within the same document.

• The result was to construct a graph with 13 nodes (topics) and 36 edges (correlation probabilities).

/ 9 /

Page 10: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

RQ 1a: Topics significance - metrics

• Degree centrality: the ability of one topic to communicate on a semantic level with others

• Closeness centrality: the ability of one topic to directly connect with others

• Betweenness centrality: the ability of a topic to stand in a central position and bridge other topics

• Clustering Coefficient: localization of topics clusters

/ 10 /

Page 11: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

RQ 1a: Topics significance

DegreeCentrality

Closeness Centrality

BetweennessCentrality

Clustering Coefficient

Distributed Services 5 1.58 2.75 0.20

Educational Content 4 1.67 0.33 0.83

Information Retrieval 6 1.50 2.08 0.60

Information Seeking 11 1.08 19.92 0.36

Interface Usability 5 1.58 1.00 0.70

Multimedia 4 1.67 1.00 0.67

Metadata Quality 5 1.58 3.03 0.40

Preservation 4 1.67 0.45 0.67

Reading Behavior 6 1.50 2.17 0.60

Recommendation Systems 5 1.58 0.78 0.70

Search Engines 5 1.58 2.95 0.40

Similarity Performance 5 1.58 1.17 0.70

Text Classification 7 1.42 4.37 0.52

/ 11 /

Page 12: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

RQ 1b: Topics interaction

-1-

• Reading behavior

• Information seeking

• Interface usability

• Metadata quality

• Educational content

-2-

• Information retrieval

• Search engines

• Text classification

• Similarity performance

• Recommendation systems

• Information seeking

• Two main subgraphs

• based on PMI and clustering coefficient

/ 12 /

Page 13: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

RQ 2: authors contribution

• Our corpus consists of 395 papers by 905 unique authors.

• An author participates to more than one paper; thus, the total number of author participations equals to 1,335.

• a paper has an average of 3.38 of author participations

• an author participates on average 1.47 times in the papers.

/ 13 /

Page 14: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

RQ 2: authors contribution

TOPIC AUTHORS PER PAPER

Educational content 4.4

Metadata quality 3.82

Distributed Services 3.58

Similarity performance 3.45

Interface usability 3.44

Multimedia 3.41

Information seeking 3.37

Recommendation systems 3.27

Search engines 3.19

Information retrieval 3.02

Text classification 3.01

Preservation 2.93

Reading behavior 2.88

/ 14 /

Page 15: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

RQ 3: authors’ multidimensionality

/ 15 /

• An author contributes to one or more topics.

• 3 topics: 382 authors

• 2 topics: 207 authors

• 1 topic: 37 authors

Page 16: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Summary

1. We applied Latent Dirichlet Allocation (LDA) on a corpus of papers to identify key topics of the DL evaluation domain.

• We created a topic map of the domain and helped to discover groups of authors that have impact on several topics.

2. We used Network Analysis centrality metrics to gain awareness of the structure, relationships and information flows.

• We revealed bipartite relationships between key topics and key authors/groups of the DL evaluation domain.

/ 16 /

Page 17: The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Thank you for your attention

Questions?

Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19

Session: Digital Library Evaluation

Time: Thursday, 08/Sep/2016, 9:00am - 10:30am

Chair: Claus-Peter Klas

Location: Blauer Saal, Hannover Congress Centrum

10.1007/978-3-319-43997-6_19