Applications of community detection in bibliometric network analysis Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University EURANDOM workshop “Networks with community structure”, Eindhoven January 24, 2014
Jun 09, 2015
Applications of community detection in bibliometric network analysis
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
EURANDOM workshop “Networks with community structure”, Eindhoven
January 24, 2014
2
Outline
• Bibliometric network analysis at CWTS
• VOSviewer
• Unified approach to visualization and community detection
• Community detection in large citation networks
• CitNetExplorer
3
Bibliometric network analysisat CWTS
4
Bibliometric network analysis at CWTS• In-house databases:
– Thomson Reuters Web of Science
– Elsevier Scopus
• Bibliometric networks:– Publication citation networks
– Journal co-citation/bibliographic coupling networks
– Term co-occurrence networks
– Co-authorship networks
– Etc.
• Applications:– Research institutions: Research assessment
– Scientific publishers: Journal profiling
– Funding agencies: Science policy analyses
5
VOSviewer(www.vosviewer.com)
6
VOSviewer
(Van Eck & Waltman, Scientometrics, 2010)
7
Citation network of fields in Web of Science
8
Co-occurrence network of terms in clinical neurology
9
Unified approach to visualization andcommunity detection
10
Visualization vs. community detection• Visualization (‘mapping’):
– Assigning the nodes in a network to locations in a (usually two-dimensional) space
• Community detection (‘clustering’):– Partitioning the nodes in a network into a number of groups
1111
Community detection seen as visualization in a restricted space
1212
Community detection seen as visualization in a restricted space
13
Unified approach to visualization and community detection
Minimize
wheren: number of nodes in the network
m: total weight of all edges in the network
Aij: weight of edge between nodes i and j
ki: total weight of all edges of node i
ji
ijji
ijijji
n ddAkkm
xxQ 21
2),,(
Visualizationxi: vector denoting the
location of node i in a p-dimensional space
p
kjkikjiij xxxxd
1
2)(
Community detectionxi: integer denoting the
community to which node i belongs
: resolution parameter
ji
jiij xx
xxd
if 1
if 0
14
Unified approach: Community detection
Equivalent to a weighted variant of modularity-based community detection (Waltman et al., 2010)
Maximize
where(xi, xj) equals 1 if xi = xj and 0 otherwise
ji
jiijijjin m
kkAwxx
mxxQ
2),(
21
),,(ˆ1
jiij kk
mw
2
15
Unified approach: Visualization
• Equivalent to the VOS (visualization of similarities) technique (Van Eck & Waltman, 2007)
• Limit case of multidimensional scaling (Van Eck et al., 2010)
ji
jiji
jiijji
xxxxAkkm
Q22
ji
jiijij xxDW2
1
2 ij
jiij A
m
kkD ij
jiij A
kkm
W2
VOS
MDS
16
Unified approach
Most commonly used community detection technique (modularity) and most commonly used visualization technique (MDS) can be brought together in a unified framework
Unified approach
Modularity (weighted)
VOS
MDS(limit case)
17
Community detection in large citation networks
18
Classification systems of scientific publications• Web of Science/Scopus classification systems:
– Scientific fields defined at the level of journals rather than individual publications
– Difficulties with multidisciplinary journals
– High level of aggregation
– Sometimes outdated or inaccurate
• Disciplinary classification systems:– E.g., CA, JEL, MeSH, PACS
– Not available for all disciplines
– Sometimes outdated or inaccurate
19
Algorithmically constructed classification systems• Publications (not journals) are clustered into fields
based on citation relations
• Fields are defined at different levels of granularity and are organized hierarchically
• Community detection based on a variant of the standard modularity function that accounts for differences in citation practices across fields
• Optimization using the smart local moving algorithm
20
Example (Waltman & Van Eck, 2012)• 10.2 million publications from the period 2001–
2010 indexed in Web of Science
• 97.6 million citation relations
• Classification system of 3 hierarchical levels:– 20 broad disciplines
– 672 fields
– 22,412 subfields
21
Visualization of 672 research areas at level 2 of the classification system
22
Visualization of 417 publications in research area 4.30.10
23
Application in a science policy context
24
CitNetExplorer(www.citnetexplorer.nl)
25
Exploring citation networks
• Macro-level applications:– Studying the development of a research field over time
– Identifying research areas
• Micro-level applications:– Studying the publication oeuvre of a researcher
– Supporting systematic literature reviewing
26
HistCite
• Timeline visualization of publications and their citation relations, referred to as algorithmic historiography by Eugene Garfield
27
CitNetExplorer
• New software tool for analyzing and visualizing citation networks
• Freely available on www.citnetexplorer.nl
• Runs on any system that offers Java support
• Citation networks can be constructed directly based on data downloaded from Web of Science
• Interactive functionality for drilling down into a citation network
• Very large citation networks can be handled, with millions of publications and tens of millions of citation relations
Demonstration
• Database: Web of Science
• Fields: Physics and multidisciplinary (Nature, PLoS ONE, PNAS, Science, etc.)
• Time period: 1998–2012
• Number of publications: ~1.8 million
• Number of citation relations: ~15.1 million
28
29
CitNetExplorer
30
References
Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.
Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50-54.
Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. JASIST, 61(12), 2405-2416.
Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392.
Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. European Physical Journal B, 86(11), 471.
Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.