Overview of Citation Analysis Clickstream Data Yields High-Resolution Maps of Science. By Johan Bollen, Herbert Van de Sompel, Aric Hagberg, Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Public Library of Science ONE, March 11, 2009.
Whose articles cite a body of work? Is this a high-impact journal? How might others assess my scholarly impact? Citation analysis is one of the primary methods used to answer these questions.
Academics, publishers, and funders often study the patterns of citations in the academic literature in order to explore the relationships among researchers, topics, and publications, and to measure the impact of articles, journals, and individuals.
In this two-hour workshop, we will provide an overview of citation analysis, including: sources of data for citation analysis, common impact measures, and freely available software.
Attendees of the class will be eligible for an individual consultation session to explore individual projects and questions
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Overview of Citation Analysis
Clic
kstr
eam
Dat
a Yi
elds
Hig
h-Re
solu
tion
Map
s of
Sci
ence
. By
Joha
n Bo
llen,
Her
bert
Van
de
Som
pel,
Aric
Hag
berg
, Lui
s Bett
enco
urt,
Ryan
Chu
te, M
arko
A. R
odrig
uez,
Lyu
dmila
Bal
akire
va. P
ublic
Li
brar
y of
Sci
ence
ON
E, M
arch
11,
200
9.
Overview of Citation Analysis
Micah AltmanDirector of Research
MIT Libraries
Sean ThomasProgram Manager for Scholarly Repository Services and the Product
Manager of DSpace@MIT
Prepared for
IAPril
MIT
April 2014
Overview of Citation Analysis
DISCLAIMERThese opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R.
Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.
Overview of Citation Analysis
Collaborators & Co-Conspirators
• Thanks to:– Michael Noga– Peter Cohn– Courtney Crummett
Overview of Citation Analysis
Related Work• K. Smith-Yoshimura, et al., 2014, Registering Researchers in
Which questions are bibliometrics being used to answer?
Some examples:
• What are the most influential journals in a particular field?
• How influential is this scholar?• Where is interdisciplinary research occurring?• Which groups of people effectively collaborate?• Which institutions are using funding most
What are impact factors?• Descriptive statistics • Usually based on citations• Commonly treated as a
proxy for the level of influence of an article, person, or journal
Common measures• ISI Journal Impact Factor:
The frequency with which the “average article” has been cited in a particular year. It is based on the most recent two years of citations. It is only supplied for journals indexed by ISI in the Web of Science.
• Article Citation Count:
Total number of citations received from other articles to target article.
• H-Index:
The maximum number of articles h such that each has received at least h citations
interactions modeled as an induced network (or graph)
• Units of observation form nodes
• Relationships form edges
Common measures• Community detection
– Modularity– Clustering– Clique
• Centrality– Betweeness– Degree– Closeness
• Diameter• Visualization
Overview of Citation Analysis
Network Analysis: Example – CitNetExplorer
Overview of Citation Analysis
Network Analysis: Example – CitNetExplorer
1. Use WOS to locate records2. Add records to “marked list”3. Click “marked list”4. Check “cited references”5. Save to other file formats6. Select windows tab delimeted7. Open in CitNetExplorer
Overview of Citation Analysis
CoAuthorship Analysis Example – Using R and JSTOR – Part 1
Note: Results are biased down, if a sample of records is used!
Overview of Citation Analysis
Limitations
Limitations of data• Citation differs systematically from sharing,
reading, or ‘use’• Relationships signaled by citation are
heterogenous: citations may indicate evidentiary support, definitions, disagreement, kudos,…
• Cited objects are heterogenous – e.g. journals include letters, comments, reviews and original research
• Databases may have limited or inconsistent coverage of publishers, fields, years, or types of publications (e.g. conference proceedings), types of objects (databases, software, books, articles)
• Some types of objects are often used without being cited
Limitations of measures• Most measures are vulnerable
to self-citation and other sorts of manipulation
• Most measures are descriptive estimates – they are not forecasting or causal inferences
• Few studies of the external validity of measures
• Few studies on error and bias in estimators
Overview of Citation Analysis
Tools
(Built-in tools)(Analysis tools)
Overview of Citation Analysis
Built-in Tools
• Database portals have built-in tools: Google Scholar; Scholarometer; Web of Science …
• Typical restrictions of built-in tools– Single database– Number of records– Usually single-author/single journal metrics– Lacks statistical forecasting/causal models– Limited data-cleaning options– Simple visualizations
• Built-in vs. external• Free vs. fee-based• Command line vs. interactive• Open source vs. closed source• Domain– Data extraction, retrieval, integration– Data cleaning and manipulation– Network visualization– Advanced measures– Statistical analysis
Overview of Citation Analysis
Choosing tools.
• Simple standard impact built-in database tools; Publish or Perish; Scholarometer
• Spreadsheet/database combination– Ease of use of spreadsheets– Reporting and manipulative power of databases
• Filters, facets, and clustering– Allow granular overview of what’s in your data– Easily see occurrence distribution of values– Easily make global corrections
• Supports both row-level and record-level (multi-row) operationsopenrefine.org
Overview of Citation Analysis
Open Refine – Reorganize DataReorganizing Data• Splitting/joining multi-
• Author identifiers give you a way to reliably and unambiguously connect your names(s) with your work throughout your career, including your papers, data, biographical information, etc. This can be helpful in a number of ways:
• Provides a means to distinguish between you and other authors with identical or similar names.
• Links together all of your works even if you have used different names over the course of your career.
• Makes it easy for others (grant funders, other researchers etc.) to find your research output.
• Ensures that your work is clearly attributed to you.
Getting started with ORCID...
• ORCID (Open Researcher and Contributor ID) is a non-prorietary, non-profit community-based registry of research identifiers.
• Links authors to their datasets and other works in addition to articles.
• Authors can control what information in their ORCID profile they share. Only the ORCID ID is automatically shared. (See their privacy policy.)
• It is easy to import research output from other sources (including ResearcherID, Scopus Author ID, and Datacite Metadata Store to your ORCID profile. (See ORCID's import works page.)
• Many organizations and publishers have created integrations with ORCID including Nature Publishing Group, Elsevier, and the American Physical Society.
The maximum number of articles h such that each has received at least h citations
• CentralityA measure of the importance of some node in the network based on a selected abstract model of influence/flow across network. Centrality measures include degree centrality (number of connections); closeness centrality (distance of node to other nodes in network); betweenness centrality (proportion of information that must pass through the node to go from one part of the network to another)
• (ISI Journal) Impact Factor:The frequency with which the “average article” has been cited in a particular year. It is based on the most recent two years of citations. It is only supplied for journals indexed by ISI in the Web of Science.
• Clustering:Method that partition n observations into k clusters based on the characteristics of the object. Clusters are defined either by a set of heuristics for forming the cluster, or according to a solution concept that the clusters will satisfy.
One common algorithm, K-Means assigns each observation to a fixed-K number of clusters such that each observation belongs to the cluster that has a mean value closest to that of the observation
• Network community structure measures:The detection of highly-interconnected groups of nodes within a network. Methods include hierarchical-clustering; information maximization; modularity; clique-detection
• Network Diameter:The greatest distance between any two nodes in the network.
• Page Rank:a family of iteratively-calculated recursive impact factors in which citations from other journals are weighted by the impact of those journals