A new software tool for large- scale analysis of citation networks Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Workshop “Measuring the Diversity of Research”, Berlin September 2, 2013
Jun 19, 2015
A new software tool for large-scale analysis of citation networksNees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Workshop “Measuring the Diversity of Research”, Berlin
September 2, 2013
2
Today’s talk
• Part 1: CWTS research program on bibliometric network analysis– VOSviewer
– VOS mapping and clustering
– Large-scale modularity optimization
– Algorithmically constructed publication-level classification system
• Part 2: New software tool for large-scale analysis of citation networks
3
CWTS research program on bibliometric network analysis
Part 1
4
VOSviewer (1)
(Van Eck & Waltman, Scientometrics, 2010)
5
VOSviewer (2)
(Van Eck & Waltman, Scientometrics, 2010)
6
Subject categories
7
Leiden University
8
Erasmus University Rotterdam
9
Delft University of Technology
10
Clinical Neurology
11
Clinical Neurology: Citation density
(Van Eck et al., PLoS ONE, 2012)
12
Clinical Neurology: Reference density
13
VOS mapping and clustering
• Mapping and clustering are commonly used bibliometric network analysis techniques
• Mapping:– Assigning the nodes in a network to locations in a (usually two-
dimensional) space
– VOS mapping technique has been developed specifically for mapping bibliometric networks
• Clustering:– Partitioning the nodes in a network into a number of groups
(a.k.a. community detection)
– VOS clustering technique has been developed to be used jointly with the VOS mapping technique in a unified technical framework
1414
Unified approach: Clustering seen as mapping in a restricted space
1515
Unified approach: Clustering seen as mapping in a restricted space
16
Unified approach to mapping and clustering
Minimize
wheren: number of nodes in the network
m: number of links in the network
cij: number of links between nodes i and j
ci: number of links of node i
ji
ijji
ijji
ijn dd
cc
mcxxQ 2
1
2),,(
Mappingxi: vector denoting the location of
node i in a p-dimensional map
p
kjkikjiij xxxxd
1
2)(
Clusteringxi: integer denoting the cluster to
which node i belongs
: resolution parameter
ji
ji
ij xx
xxd
if1
if0
17
Unified approach: Mapping
• Equivalent to the VOS mapping technique
• Closely related to multidimensional scaling (Van Eck et al., JASIST, 2010)
18
Unified approach: Clustering
• Equivalent to a weighted and parameterized variant of modularity-based clustering (Waltman et al., JOI, 2010)
• Parameter makes it possible to customize the granularity level of the clustering
Maximize
where(xi, xj) equals 1 if xi = xj and 0 otherwise
ji
jiijijjin m
cccwxx
mxxQ
2),(
2
1),,(ˆ
1
jiij cc
mw
2
19
Large-scale modularity optimization• Modularity optimization is one of the most
popular approaches to clustering in networks
• Several variants of the original modularity function have been proposed, supporting for instance weighted networks and different resolution levels
• Optimization of modularity functions in large networks (with millions of nodes and edges) has received only limited attention but has important applications in bibliometrics
20
• ‘Louvain algorithm’ (Blondel et al., 2008) is the best-known algorithm for large-scale modularity optimization
• Our proposed ‘smart local moving algorithm’ can be seen as an enhanced version of this algorithm (Waltman & Van Eck, 2013)
New algorithm for large-scale modularity optimization
21
Louvain algorithmQ = 0.3791
Q = 0.4151
22
Smart local moving algorithm
Q = 0.4198
Q = 0.3791
23
Comparison
Network Louvain Smart local
moving
Amazon
(0.5M / 0.9M)
Qmin 0.9257 0.9335
Qmax 0.9264 0.9338
t 6 28
DBLP
(0.4M / 1.0M)
Qmin 0.8203 0.8357
Qmax 0.8227 0.8367
t 7 26
IMDb
(0.4M / 15.0M)
Qmin 0.6976 0.7050
Qmax 0.7041 0.7077
t 18 100
LiveJournal
(4.0M / 34.7M)
Qmin 0.7441 0.7676
Qmax 0.7557 0.7720
t 350 1 549
WoS
(10.6M / 104.5M)
Qmin 0.7714 0.7918
Qmax 0.7786 0.7957
t 6 800 19 994
Web uk-2005
(39.5M / 783.0M)
Qmin 0.9793 0.9801
Qmax 0.9795 0.9801
t 11 006 17 074
24
Classification systems of scientific publications• Web of Science/Scopus journal subject categories:
– Scientific fields defined at the level of journals rather than individual publications
– Difficulties with multidisciplinary journals
– High level of aggregation
– Sometimes outdated or inaccurate
• Disciplinary classification systems:– E.g., CA, JEL, MeSH, PACS
– Not available for all disciplines
– Sometimes outdated or inaccurate
25
Algorithmic classification systems (Waltman & Van Eck, JASIST, 2012)• Why not algorithmically construct a classification
system of science?
• We cluster publications (not journals) into fields based on citation relations
• Only direct citation relations are used; no co-citation or bibliographic coupling relations
• Fields are defined at different levels of granularity and are organized hierarchically
26
Example
• 10.2 million publications from the period 2001–2010 indexed in Web of Science
• 97.6 million direct citation relations
• Classification system of 3 hierarchical levels:– 20 broad disciplines
– 672 fields
– 22,412 subfields
• Clustering by optimizing a variant of the standard modularity function that accounts for differences across fields in citation practices
27
Map of the 672 research areas at level 2 of the classification system
28
Map of the 417 publications in research area 4.30.10
29
New software tool for exploring large-scale citation networks
Part 2
30
Exploring citation networks: Why?
• To support literature reviewing
• To show how the scientific literature has evolved over time
• To delineate topics or research areas in the literature
• To identify connections between different topics in the literature
Motivation for a new tool
• VOSviewer has proven to be a very useful tool for visualizing science from a static point of view
• VOSviewer has not been developed for visualizing the dynamics of science
• In fact, the availability of software tools for dynamic visualizations is rather limited:– CiteSpace (Chaomei Chen)
– HistCite (Eugene Garfield)
31
32
HistCite
• Timeline visualization of publications and their citation relations, referred to as algorithmic historiography by Garfield
Citation Network Explorer
• Somewhat similar to HistCite, but capable of dealing with much larger citation networks
• So far, the tool has been used successfully with the entire Web of Science citation network of the social sciences (1980–2013; ~2M publications and ~20M citations)
• The aim is to be able to handle the entire citation network of all scientific disciplines (~40M publications and ~500M citations)
33
Today’s demonstration (1)
• We demonstrate a prototype of the tool
• The core functionality is available, but some options have not yet been fully implemented
• Your feedback is very much appreciated!
34
Today’s demonstration (2)
• Data set 1:– Scientometrics
– 1980–2013
– ~10K publications and ~60K citations
• Data set 2:– All social sciences except for psychology, education, and
health-related sciences
– 1980–2013
– ~1.4M publications and ~10M citations
35
36
Citation Network Explorer
37
References
Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.
Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50-54.
Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. JASIST, 61(12), 2405-2416.
Van Eck, N.J., Waltman, L., Van Raan, A.F.J., Klautz, R.J.M., & Peul, W.C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLoS ONE, 8(4), e62395.
Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392.
Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. arXiv:1308.6604.
Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.