Top Banner
A new software tool for large- scale analysis of citation networks Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Workshop “Measuring the Diversity of Research”, Berlin September 2, 2013
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A new software tool for large-scale analysis of citation networks

A new software tool for large-scale analysis of citation networksNees Jan van Eck

Centre for Science and Technology Studies (CWTS), Leiden University

Workshop “Measuring the Diversity of Research”, Berlin

September 2, 2013

Page 2: A new software tool for large-scale analysis of citation networks

2

Today’s talk

• Part 1: CWTS research program on bibliometric network analysis– VOSviewer

– VOS mapping and clustering

– Large-scale modularity optimization

– Algorithmically constructed publication-level classification system

• Part 2: New software tool for large-scale analysis of citation networks

Page 3: A new software tool for large-scale analysis of citation networks

3

CWTS research program on bibliometric network analysis

Part 1

Page 4: A new software tool for large-scale analysis of citation networks

4

VOSviewer (1)

(Van Eck & Waltman, Scientometrics, 2010)

Page 5: A new software tool for large-scale analysis of citation networks

5

VOSviewer (2)

(Van Eck & Waltman, Scientometrics, 2010)

Page 6: A new software tool for large-scale analysis of citation networks

6

Subject categories

Page 7: A new software tool for large-scale analysis of citation networks

7

Leiden University

Page 8: A new software tool for large-scale analysis of citation networks

8

Erasmus University Rotterdam

Page 9: A new software tool for large-scale analysis of citation networks

9

Delft University of Technology

Page 10: A new software tool for large-scale analysis of citation networks

10

Clinical Neurology

Page 11: A new software tool for large-scale analysis of citation networks

11

Clinical Neurology: Citation density

(Van Eck et al., PLoS ONE, 2012)

Page 12: A new software tool for large-scale analysis of citation networks

12

Clinical Neurology: Reference density

Page 13: A new software tool for large-scale analysis of citation networks

13

VOS mapping and clustering

• Mapping and clustering are commonly used bibliometric network analysis techniques

• Mapping:– Assigning the nodes in a network to locations in a (usually two-

dimensional) space

– VOS mapping technique has been developed specifically for mapping bibliometric networks

• Clustering:– Partitioning the nodes in a network into a number of groups

(a.k.a. community detection)

– VOS clustering technique has been developed to be used jointly with the VOS mapping technique in a unified technical framework

Page 14: A new software tool for large-scale analysis of citation networks

1414

Unified approach: Clustering seen as mapping in a restricted space

Page 15: A new software tool for large-scale analysis of citation networks

1515

Unified approach: Clustering seen as mapping in a restricted space

Page 16: A new software tool for large-scale analysis of citation networks

16

Unified approach to mapping and clustering

Minimize

wheren: number of nodes in the network

m: number of links in the network

cij: number of links between nodes i and j

ci: number of links of node i

ji

ijji

ijji

ijn dd

cc

mcxxQ 2

1

2),,(

Mappingxi: vector denoting the location of

node i in a p-dimensional map

p

kjkikjiij xxxxd

1

2)(

Clusteringxi: integer denoting the cluster to

which node i belongs

: resolution parameter

ji

ji

ij xx

xxd

if1

if0

Page 17: A new software tool for large-scale analysis of citation networks

17

Unified approach: Mapping

• Equivalent to the VOS mapping technique

• Closely related to multidimensional scaling (Van Eck et al., JASIST, 2010)

Page 18: A new software tool for large-scale analysis of citation networks

18

Unified approach: Clustering

• Equivalent to a weighted and parameterized variant of modularity-based clustering (Waltman et al., JOI, 2010)

• Parameter makes it possible to customize the granularity level of the clustering

Maximize

where(xi, xj) equals 1 if xi = xj and 0 otherwise

ji

jiijijjin m

cccwxx

mxxQ

2),(

2

1),,(ˆ

1

jiij cc

mw

2

Page 19: A new software tool for large-scale analysis of citation networks

19

Large-scale modularity optimization• Modularity optimization is one of the most

popular approaches to clustering in networks

• Several variants of the original modularity function have been proposed, supporting for instance weighted networks and different resolution levels

• Optimization of modularity functions in large networks (with millions of nodes and edges) has received only limited attention but has important applications in bibliometrics

Page 20: A new software tool for large-scale analysis of citation networks

20

• ‘Louvain algorithm’ (Blondel et al., 2008) is the best-known algorithm for large-scale modularity optimization

• Our proposed ‘smart local moving algorithm’ can be seen as an enhanced version of this algorithm (Waltman & Van Eck, 2013)

New algorithm for large-scale modularity optimization

Page 21: A new software tool for large-scale analysis of citation networks

21

Louvain algorithmQ = 0.3791

Q = 0.4151

Page 22: A new software tool for large-scale analysis of citation networks

22

Smart local moving algorithm

Q = 0.4198

Q = 0.3791

Page 23: A new software tool for large-scale analysis of citation networks

23

Comparison

Network  Louvain Smart local

moving

Amazon

(0.5M / 0.9M)

Qmin 0.9257 0.9335

Qmax 0.9264 0.9338

t 6 28

DBLP

(0.4M / 1.0M)

Qmin 0.8203 0.8357

Qmax 0.8227 0.8367

t 7 26

IMDb

(0.4M / 15.0M)

Qmin 0.6976 0.7050

Qmax 0.7041 0.7077

t 18 100

LiveJournal

(4.0M / 34.7M)

Qmin 0.7441 0.7676

Qmax 0.7557 0.7720

t 350 1 549

WoS

(10.6M / 104.5M)

Qmin 0.7714 0.7918

Qmax 0.7786 0.7957

t 6 800 19 994

Web uk-2005

(39.5M / 783.0M)

Qmin 0.9793 0.9801

Qmax 0.9795 0.9801

t 11 006 17 074

Page 24: A new software tool for large-scale analysis of citation networks

24

Classification systems of scientific publications• Web of Science/Scopus journal subject categories:

– Scientific fields defined at the level of journals rather than individual publications

– Difficulties with multidisciplinary journals

– High level of aggregation

– Sometimes outdated or inaccurate

• Disciplinary classification systems:– E.g., CA, JEL, MeSH, PACS

– Not available for all disciplines

– Sometimes outdated or inaccurate

Page 25: A new software tool for large-scale analysis of citation networks

25

Algorithmic classification systems (Waltman & Van Eck, JASIST, 2012)• Why not algorithmically construct a classification

system of science?

• We cluster publications (not journals) into fields based on citation relations

• Only direct citation relations are used; no co-citation or bibliographic coupling relations

• Fields are defined at different levels of granularity and are organized hierarchically

Page 26: A new software tool for large-scale analysis of citation networks

26

Example

• 10.2 million publications from the period 2001–2010 indexed in Web of Science

• 97.6 million direct citation relations

• Classification system of 3 hierarchical levels:– 20 broad disciplines

– 672 fields

– 22,412 subfields

• Clustering by optimizing a variant of the standard modularity function that accounts for differences across fields in citation practices

Page 27: A new software tool for large-scale analysis of citation networks

27

Map of the 672 research areas at level 2 of the classification system

Page 28: A new software tool for large-scale analysis of citation networks

28

Map of the 417 publications in research area 4.30.10

Page 29: A new software tool for large-scale analysis of citation networks

29

New software tool for exploring large-scale citation networks

Part 2

Page 30: A new software tool for large-scale analysis of citation networks

30

Exploring citation networks: Why?

• To support literature reviewing

• To show how the scientific literature has evolved over time

• To delineate topics or research areas in the literature

• To identify connections between different topics in the literature

Page 31: A new software tool for large-scale analysis of citation networks

Motivation for a new tool

• VOSviewer has proven to be a very useful tool for visualizing science from a static point of view

• VOSviewer has not been developed for visualizing the dynamics of science

• In fact, the availability of software tools for dynamic visualizations is rather limited:– CiteSpace (Chaomei Chen)

– HistCite (Eugene Garfield)

31

Page 32: A new software tool for large-scale analysis of citation networks

32

HistCite

• Timeline visualization of publications and their citation relations, referred to as algorithmic historiography by Garfield

Page 33: A new software tool for large-scale analysis of citation networks

Citation Network Explorer

• Somewhat similar to HistCite, but capable of dealing with much larger citation networks

• So far, the tool has been used successfully with the entire Web of Science citation network of the social sciences (1980–2013; ~2M publications and ~20M citations)

• The aim is to be able to handle the entire citation network of all scientific disciplines (~40M publications and ~500M citations)

33

Page 34: A new software tool for large-scale analysis of citation networks

Today’s demonstration (1)

• We demonstrate a prototype of the tool

• The core functionality is available, but some options have not yet been fully implemented

• Your feedback is very much appreciated!

34

Page 35: A new software tool for large-scale analysis of citation networks

Today’s demonstration (2)

• Data set 1:– Scientometrics

– 1980–2013

– ~10K publications and ~60K citations

• Data set 2:– All social sciences except for psychology, education, and

health-related sciences

– 1980–2013

– ~1.4M publications and ~10M citations

35

Page 36: A new software tool for large-scale analysis of citation networks

36

Citation Network Explorer

Page 37: A new software tool for large-scale analysis of citation networks

37

References

Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.

Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50-54.

Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. JASIST, 61(12), 2405-2416.

Van Eck, N.J., Waltman, L., Van Raan, A.F.J., Klautz, R.J.M., & Peul, W.C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLoS ONE, 8(4), e62395.

Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392.

Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. arXiv:1308.6604.

Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.