Page 1
Large-scale analysis of bibliometric
networks
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
International Conference on Data-driven Discovery:
When Data Science Meets Information Science
Beijing, China, June 20, 2016
Page 2
Bibliographic databases: ‘Big data’
1
Web of Science Scopus
Journals 12,000 20,000
Publications 45 million 35 million
Citations 1 billion 0.9 billion
Page 3
Bibliometric networks
2
Web of
Science
Scopus
Citation network
of pubs / authors / journals
Co-authorship network
of authors / organizations
Co-citation network
of pubs / authors / journals
Co-occurrence network
of keywords / terms
Bibliographic coupling network
of pubs / authors / journals
Bibliographic
database
Page 4
Outline
• Software tools
• Network analysis techniques
• Analysis of data science
3
Page 6
Software tools
• VOSviewer (www.vosviewer.com)
– Tool for constructing and visualizing bibliometric networks
• CitNetExplorer (www.citnetexplorer.nl)
– Tool for visualizing and analyzing citation networks of
publications
• Both tools have been developed together
with my colleague Ludo Waltman 5
Page 8
VOSviewer: Overview
• Software tool for visualizing (bibliometric) networks
• Built-in support for popular bibliographic databases
• Text mining functionality
• Layout and clustering techniques
• Advanced visualization features:
– Smart labeling algorithm
– Overlay visualizations
– Density visualizations (‘heat map’)
• Users:
– Researchers
– Professional users (e.g., universities, libraries, funders,
publishers)7
Page 9
Map of university co-authorship
network
8
Page 10
Map of journal citation network
9
Page 11
CitNetExplorer
10
Page 12
• Any type of bibliometric
network
• Co-authorship, direct citations,
co-citation, and bibliographic
coupling
• Time dimension is ignored
• Networks of at most ~10,000
nodes are supported
• Only citation networks of
publications
• Direct citation between
publications
• Time dimension is explicitly
considered
• Millions of publications are
supported
11
VOSviewer CitNetExplorer
Page 13
Network
analysis
techniques
12
Page 14
Network analysis techniques
13
Layout:
• Assigning the nodes in a network to
locations in a (usually 2d) space
(a.k.a. mapping)
• Visualization of similarities (VOS)
Clustering:
• Partitioning the nodes in a network
into a number of groups (a.k.a.
community detection)
• Weighted modularity
• Smart local moving algorithm
Page 15
1414
Clustering can be seen as mapping
in a restricted space
Page 16
1515
Clustering can be seen as mapping
in a restricted space
Page 17
Unified approach to mapping and
clustering
Minimize
where
n: number of nodes in the network
m: total weight of all edges in the network
Aij: weight of edge between nodes i and j
ki: total weight of all edges of node i
16
ji
ij
ji
ijij
ji
nddA
kk
mxxQ
2
1
2),,(
Mapping
xi: vector denoting the location
of node i in a p-dimensional
space
p
k
jkikjiijxxxxd
1
2
)(
Clustering
xi: integer denoting the
community to which node i
belongs
: resolution parameter
ji
ji
ij
xx
xx
d
if 1
if 0
Page 18
Smart local moving algorithm
17
Q = 0.4198
Q = 0.3791
Reduced
network
Local moving
heuristic in
subnetworks
Local moving heuristic
Original
network
Page 19
Algorithmically constructed
classification system of science
• 17.8 million publications from the period 2000–
2015 indexed in Web of Science
• 282.4 million citation relations
• Classification system of 3 hierarchical levels:
– 27 broad disciplines
– 817 fields
– 4,113 subfields
18
Page 20
Breakdown of scientific literature into
817 fields
19
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Page 21
Publications in scientometrics
subfield
20
Page 22
Time-line map of highly cited
scientometrics publications
21
Page 23
Analysis of
data science
22
Page 24
What is data science?
• Empirical operationalization of data science based
on publications with ‘data’ in title or abstract
23
Wikipedia: “Data Science is an interdisciplinary field
about processes and systems to extract knowledge
or insights from data … which is a continuation of
some of the data analysis fields such as statistics,
data mining, and predictive analytics”
LCDS: “Data Science … deals with finding, analyzing
and validating complex patterns in data. Data
Science methods are indispensable for maintaining a
competitive edge in all disciplines in science”
Page 25
Growth of data-driven research
24
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
1990 1995 2000 2005 2010 2015
Percen
tag
e o
f p
ub
licatio
ns
% 'data' publications % 'theory' publications
Page 26
Breakdown of scientific literature into
817 fields
25
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Page 27
Data-driven nature of different
scientific fields
26
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
% pub. with ‘data’ in title or abstract
Page 28
Data-driven nature of different
scientific fields
27
artificial
intelligence
statisticsbioinformatics
neuroimaging pattern
recognitionastronomy
earthwater
climate
remote
sensing
nutrition
obesity
addiction
accident
analysis
% pub. with ‘data’ in title or abstract
Page 29
Data science fields (at least 25% ‘data’
publications)
28
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Page 30
Term map of data science fields
29
Page 31
China’s publication output in data
science fields
30
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Page 32
China’s publication output in data
science fields
31
artificial
intelligence
pattern
recognition
high
energy
earth
atmospheres
weatherremote
sensing
Page 33
Chinese institutes with most publications
in data science fields (2011-2015)
• Chinese Academy of Sciences
• Peking University
• Tsinghua University
• China University of Geosciences
• Zhejiang University
• Nanjing University
• Shanghai Jiao Tong University
• University of Science and Technology of China
• Beijing Normal University
• University of Hong Kong
32
Page 34
CAS publication output in data
science fields
33
earth
atmospheres
weatherremote
sensing
vegetation
astronomy
high energy
Page 35
Term map based on CAS publications in
data science fields
34
Page 36
CAS (Beijing Branch) publication
output in data science fields
35
astronomy
earth
atmospheres
weatherremote
sensing
vegetation
high energy
Page 37
CAS (Shanghai Branch) publication
output in data science fields
36
bioinformatics
genetics
astronomy
nuclear
Page 38
Do it yourself!
37
www.vosviewer.com www.citnetexplorer.nl
Page 39
Thank you for your attention!
38