Overview of Bibliometrics - IAP Course

Overview of Citation Analysis

Clic

kstr

eam

Dat

a Yi

elds

Hig

h-Re

solu

tion

Map

s of

Sci

ence

. By

Joha

n Bo

llen,

Her

bert

Van

de

Som

pel,

Aric

Hag

berg

, Lui

s Bett

enco

urt,

Ryan

Chu

te, M

arko

A. R

odrig

uez,

Lyu

dmila

Bal

akire

va. P

ublic

Li

brar

y of

Sci

ence

ON

E, M

arch

11,

200

9.


Micah AltmanDirector of Research

MIT Libraries

Sean ThomasProgram Manager for Scholarly Repository Services and the Product

Manager of DSpace@MIT

Prepared for

IAPril

MIT

April 2014


DISCLAIMERThese opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators

Secondary disclaimer:

“It’s tough to make predictions, especially about the future!”

-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R.

Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.


Collaborators & Co-Conspirators

• Thanks to:– Michael Noga– Peter Cohn– Courtney Crummett


Related Work• K. Smith-Yoshimura, et al., 2014, Registering Researchers in

Authority Files, OCLC Research. • Liz Allen, Jo Scott, Amy Brand, Marjorie M.K. Hlava, Micah Altman

(Forthcoming), Beyond authorship: recognising the contributions to research; Nature.

• Data Synthesis Task Group. 2014. Joint Principles for Data Citation.• CODATA Data Citation Task Group, 2013. Out of Cite, Out of Mind:

The Current State of Practice, Policy and Technology for Data Citation. Data Science Journal. 2013;12:1–75.

Slides and reprints available from:informatics.mit.edu

http://informatics.mit.edu/


And now, a word from our sponsor…The Libraries @ MIT

The MIT libraries provide support for all researchers at MIT:

• Research consulting, including:bibliographic information management; literature searches; subject-specific consultation

• Data management, including:data management plan consulting; data archiving; metadata creation

• Data acquisition and analysis, including:database licensing; statistical software training; GIS consulting, analysis & data collection

• Scholarly publishing:open access publication & licensing

libraries.mit.edu

http://libraries.mit.edu/


Roadmap

* Background * * Metrics *

* Data ** Tools *

* Data Processing * * Putting it all together *

* Resources *


Background(Why?)(What?)(Which?)


What are bibliometrics?(simple definition)

Bibliometrics are measures of scholarly outputs.

10

Scholarly output effects reputation, ranking, and funding of the discipline, institution, and individual scholar

We initially use bibliometric analysis to look at the top institutions, by publications and citation count for the past ten years…

Universities are ranked by several indicators of academic or research performance, including… highly cited researchers…

Citations… are the best understood and most widely accepted measure of research strength.

Capturing Contributor Roles in Scholarly Publications

Then

Clarke, Beverly L. "Multiple authorship trends in scientific papers." Science 143.3608 (1964): 822-824.


Now


Now is More


What are bibliometrics?(Extended Definition)

• Analysis of characteristics of/relationships amongresearch/scholarly outputs/publications

– Analysis includes: lists, descriptive statistics, visualization, inference

– Outputs include:grants, articles, books, databases, software, patents


Which questions are bibliometrics being used to answer?

Some examples:

• What are the most influential journals in a particular field?

• How influential is this scholar?• Where is interdisciplinary research occurring?• Which groups of people effectively collaborate?• Which institutions are using funding most

productively?


Data

(Leading Databases)(Subject-Specific)(MIT Internal)(Selection)

Google Scholar

Data Sources• Unspecified coverage, but…• Wide coverage of books,

preprint, conference proceedings, non-english work, working papers, patents, institutional repositories

Built-in Metrics• Journal H-Index• Author Profiles

– Total & Five-Year Counts– I-10 index and H-index– Yearly citations

• Limited filtering


scholar.google.com

http://scholar.google.com/


Data• Frequently updated/current• Covers journal articles

published after 1995• Wide disciplinary coverage• Includes theses and patents,

and citations from these • Includes some institutional

repositories• Commercial

Metrics• Citation lists & counts• Author impact & articles

– Statistics – Metrics – Graphs

• Journal impact – Statistics– Metrics– graphs

scopus.com

Scopus

https://scopus.com/


Data• Journal coverage after 1899• Many conference proceedings

since 1990• Many books since 2005• Limited coverage of non-

english works• Doesn’t index institutional

repositories and e-print servers• Commercial

Metrics• Citation lists & counts• Author impact & articles

– Statistics – Metrics – Graphs

• Journal impact – Statistics– Metrics– graphs

apps.webofknowledge.com

Web of Science

http://apps.webofknowledge.com/


Major Subject Specific Catalogs With Citation Metrics• SciFinder:

chemical abstracts scifinder.cas.org

• PsycInfo: psychological literaturewww.apa.org/pubs/databases/psycinfo/

• Business Source Complete:business articleswww.ebscohost.com/academic/business-source-complete

• arXiv: physics, mathematics, nonlinear sciences, computer science, quantitative biology, quantitative finance, statistics (Integrates w/NASA-ADS and INSPIRE)arxiv.org

• mathSciNetMathematical Reviews. Computes collaboration distances.www.ams.org/mathscinet/

• IEEE Digital Librarycontent published by the IEEE including citing references

• USPTO: find patents that are cited by/cite othersuspto.gov/patft/

• ACM Digital LibrariesFull text and citation of ACM articles and proceedingsdl.acm.org

VERA: owens.mit.edu/sfx_local/az/mit_db

https://scifinder.cas.org/

http://www.apa.org/pubs/databases/psycinfo/index.aspx



http://www.ebscohost.com/academic/business-source-complete



http://arxiv.org/

http://www.ams.org/mathscinet/



http://www.uspto.gov/patft/index.html



http://dl.acm.org/dl.cfm

http://owens.mit.edu/sfx_local/az/mit_db



APIs for Scholarly Resources

What are API’s?

• Application programming interface (APIs), are tools used to expose raw data, query interfaces, or other functions to other software applications

• Typically more flexible than interactive interfaces

• Requires programming skills


libguides.mit.edu/apis

http://libguides.mit.edu/apis




MIT Internal DataInstitute Data (Restricted Use)• IS&T DataWarehouse

Data from administrative systems. E.g. MIT people, organizations, grants and awards

ist.mit.edu/warehouse

• Office of the Provost – Institutional Research

Provides analytical and research support to the Provost, academic departments, research laboratories and centers.

web.mit.edu/ir/

Libraries Data • DSpace@MIT

lists of publications in Dspace by author/department

dspace.mit.edu

• Barton

lists of MIT these by author/advisor

library.mit.edu

http://ist.mit.edu/warehouse

http://ist.mit.edu/warehouse

http://web.mit.edu/ir/

http://web.mit.edu/ir/

http://dspace.mit.edu/

http://library.mit.edu/


Comparing Databases

Coverage• Years• Disciplines• Publishers/sources• Venue –

journals/conferences/working paper/IR/personal web sites

• Documentation of coverage• Completeness

Characteristics• Internal vs. external• Free vs. fee-based• API vs. interactive• Open data vs. restrictive

licensed• Structured vs. unstructured • Full text vs. metadata


Selecting a Database

• Free, quick, and useful Google Scholar• Extract data for further simple analysis

Scholarometer (google scholar extract), Scopus, WOS

• More complete coverage use multiple databases

• Specialized subject/single article disciplinary database/API

• Extract data for network analysis API

Free & Easy

$$ and/or programmatic


Measures(Article metrics)(Author Impact)(Journal Impact) (Collaboration)(Network Analysis)


Article Metrics: Overview

What are article-level metrics?

• Measures on specific published articles

• Typically used in construction of literature reviews; or as building blocks for other measures

Common measures• Citations list• Citation counts• References• Captures/bookmarks• Downloads• Mentions• Likes• Views• Readers

sparc.arl.org/sites/default/files/sparc-alm-primer.pdf

http://www.sparc.arl.org/sites/default/files/sparc-alm-primer.pdf




Article Metrics: Using Google Scholar

Steps1. Go to scholar.google.com 2. Search (Full Text + Metadata)

– Unstructured keyword search OR

– “Advanced” fielded search

3. Sort– by relevance

OR– ny date

4. Filter– By Date range AND/OR– By Corpus (case law, patents)

Results• Number of citations to

article indexed google scholar

• List of citing articles• Article text

(sometimes)



Article Metrics: Example – Google Scholar


Article Metrics: Altmetrics

Types• Captures/bookmarks• Downloads• Mentions• Likes• Views• ReadersSources• Social media• Reference management

(e.g. citeulike, mendeley )• Indexes/searches

(e.g. Scopus)

Sources• PLOS article metrics

article-level-metrics.plos.org

• Plum Analyticsplumanalytics.com

• ImpactStoryimpactstory.org

http://article-level-metrics.plos.org/



http://www.plumanalytics.com/

http://impactstory.org/


Article Metrics: Database Comparison

Google Scholar,Scopus,WOS

PLOS

Plos Articles Only

PlumX

Coverage Wide variety PLOS Articles Wide Variety

Measures Citation countCitation list

Citation countCitation listViewsDownloadsMentionsBookmarksComments

Citation countCitation listViewsDownloadsMentionsBookmarksComments


‘Impact’ Factors: Overview

What are impact factors?• Descriptive statistics • Usually based on citations• Commonly treated as a

proxy for the level of influence of an article, person, or journal

Common measures• ISI Journal Impact Factor:

The frequency with which the “average article” has been cited in a particular year. It is based on the most recent two years of citations. It is only supplied for journals indexed by ISI in the Web of Science.

• Article Citation Count:

Total number of citations received from other articles to target article.

• H-Index:

The maximum number of articles h such that each has received at least h citations

libraries.mit.edu/scholarly/publishing/impact-factors/

http://libraries.mit.edu/scholarly/publishing/impact-factors/




Author Impact: Example – Google Scholar


Author Impact: Example – Exporting Data with Scholarometer


Author Impact: Example – Web of Science


Author Impact: Database Comparison

Google Scholar Scholar+Scholarometer

Scopus Web of Science

Select Any Author

Only w/profiles Yes Yes Yes

Export data No Yes Yes Yes

Exclude articles No Yes Yes Yes

Metrics H-index,I10,num cites

H-index,I10,num cites

H-index,… H-index

Visualization Minimal Minimal Yes Yes

Longitudinal Minimal Minimal Yes Yes


Journal Impact: Using Online Services

Scholar

1. Go to scholar.google.com

2. Click on METRICS

3. Google rank and journal h-5 factor displayed

4. Filter by country & field

Scopus• Go to

scopus.com • Click on

Journal Analyzer

• Select journal• Select statistics

Web of Science1. Go to admin

-apps.webofknowledge.com/JCR/

2. Select field and year + SUBMIT

3. Select subject + SUBMIT



http://scopus.com/

http://admin-apps.webofknowledge.com/JCR/






Journal Impact: Example – Google Scholar


Journal Impact: Example – Web of Science


Journal Impact: Example – Scopus


Journal Impact: Database Comparison

Google Scholar Scopus Web of Science

Journals Covered Top 100 ranked in each language

Mostly english-language Many (selected) Journals

Metrics H5 Median Many Impact factor, Many others

Visualization No Yes Yes

Longitudinal analysis

No Yes Yes

Discipline Rankings No No Yes


Network Analysis

What is network analysis?• Study of objects and

interactions modeled as an induced network (or graph)

• Units of observation form nodes

• Relationships form edges

Common measures• Community detection

– Modularity– Clustering– Clique

• Centrality– Betweeness– Degree– Closeness

• Diameter• Visualization


Network Analysis: Example – CitNetExplorer


Network Analysis: Example – CitNetExplorer

1. Use WOS to locate records2. Add records to “marked list”3. Click “marked list”4. Check “cited references”5. Save to other file formats6. Select windows tab delimeted7. Open in CitNetExplorer


CoAuthorship Analysis Example – Using R and JSTOR – Part 1


% cut -d"," -f 1-11 citations.CSV >areastudies2003.csv

R> areastudies.df< read.table(file="citations.CSV",row.names=NULL,sep=",",quote="",stringsAsFactors=F,header=T)R> authorList <- strsplit(areastudies.df$author,perl=TRUE,split="\t")R> plot(table(sapply(authorList,length)))



createCoauthorlist<-function(pl){ coauthors<-list() updateCoauthor<-function(co,paperAuthors) { tmp <- unlist( coauthors[co] ) tmp <- union(tmp,unlist(paperAuthors)) coauthors[[co]]<<-tmp } sapply(pl, function(x)sapply(x,function(y)updateCoauthor(y,x))) return (coauthors)}


R> Coa<-createCoauthorlist()R> coa<-createCoauthorlist(authorList)R> plot(table(sapply(coa,length)))

Note: Results are biased down, if a sample of records is used!


Limitations

Limitations of data• Citation differs systematically from sharing,

reading, or ‘use’• Relationships signaled by citation are

heterogenous: citations may indicate evidentiary support, definitions, disagreement, kudos,…

• Cited objects are heterogenous – e.g. journals include letters, comments, reviews and original research

• Databases may have limited or inconsistent coverage of publishers, fields, years, or types of publications (e.g. conference proceedings), types of objects (databases, software, books, articles)

• Some types of objects are often used without being cited

Limitations of measures• Most measures are vulnerable

to self-citation and other sorts of manipulation

• Most measures are descriptive estimates – they are not forecasting or causal inferences

• Few studies of the external validity of measures

• Few studies on error and bias in estimators


Tools

(Built-in tools)(Analysis tools)


Built-in Tools

• Database portals have built-in tools: Google Scholar; Scholarometer; Web of Science …

• Typical restrictions of built-in tools– Single database– Number of records– Usually single-author/single journal metrics– Lacks statistical forecasting/causal models– Limited data-cleaning options– Simple visualizations


External Tools

Feature sets• Data retrieval• Data processing

(next section)• Core statistics• Visualization• Exploratory network

analysis• Network modeling

Choosing a tool• Open vs. closed source• Free vs. commercial• GUI vs. CLI• Scalability• Single Platform/Multi-

Platform• Feature Set• Maintenance/support


Publish or Perish• Automatic data retrieval

– MS Academic Search– Google Scholar

• Standard single-author metrics – Total number of papers and

total number of citations– Average citations per paper,

citations per author, papers per author, and citations per year

– Hirsch's h-index and related parameters and variations

• Data export to CSV www.harzing.com/pop.htm

http://www.harzing.com/pop.htm



Scholarometer

Data• Google Scholar• Crowd-source tags

(disciplines) – available through API

• Data export to CSV

Metrics• Single/combined author

citation count/h-index rank• Discipline rank/• Author network

visualization• Discipline network

visualization

scholarometer.indiana.edu

http://scholarometer.indiana.edu/data.html


PajekAnalysis• Network visualization• Supports complex

networks: multi-relational, longitudinal, 2-mode

• Layout control• Clustering• Community detection

pajek.imfm.si

Source: www.public.asu.edu/~majansse/pubs/SupplementIHDP.htm

http://pajek.imfm.si/


CitNetExplorerFeatures• Citation/bibliometric specific

tool• Web of Science import.• Pajek export. • Large networks.

(millions of publications)• Simple network visualizations• Network measures:

connected components, clusters, core publications …

citnetexplorer.nl

http://citnetexplorer.nl/Home


SciMatFeatures• Workflow support• Network visualization• Data processing and

cleanup• Longitudinal analysis • Metrics: h-index

sci2s.ugr.es/scimat/

http://sci2s.ugr.es/scimat/




GephiAnalysis• Network graphs & layout• Dynamic filtering

(including time-sliders)• Clustering• SNA: betweeness,

closeness, diameter, PageRank, HITS,…

• Community detection(modularity)

gephi.org

https://gephi.org/


Sci2Tool

Analysis and Visualization• Temporal – burst detection• Geospatial• Topical• Networks – trees and

graphs

Additional Benefits• Parsers for citation data• Bibliometric analysis tools• Portable output files• Direct connections to R and

Gephi

http://sci2.cns.iu.edu


Command-Line ToolsUsing Python

• Scipy:scientific data processing, statistics, visualizationscipy.org

• NLTK:text processing and analysisnltk.org

• NetworkX:network measures (descriptive)networkx.github.io

• Bibtools:parse WOS data, and identify comunities of cocitationwww.sebastian-grauwin.com/?page_id=492

Using R

• tm:simple text processing and analysiscran.r-project.org/web/packages/tm/

• StatNet: network measures (descriptive); social network analysis (forecasting, causal); visualizationstatnet.org

• Citan: citation analysiscran.r-project.org/web/packages/CITAN

Web integration for interactive visualization: d3js.org

http://scypi.org/

http://nltk.org/

http://networkx.github.io/

http://www.sebastian-grauwin.com/?page_id=492



http://cran.r-project.org/web/packages/tm/



http://statnet.org/

http://cran.r-project.org/web/packages/CITAN/index.html


http://d3js.org/


Characteristics of Tools

• Built-in vs. external• Free vs. fee-based• Command line vs. interactive• Open source vs. closed source• Domain– Data extraction, retrieval, integration– Data cleaning and manipulation– Network visualization– Advanced measures– Statistical analysis


Choosing tools.

• Simple standard impact built-in database tools; Publish or Perish; Scholarometer

• Messy data OpenRefine + …• Network analysis measures– Network measures Sci2,SciMat, Pajek– Visualizations Gephi, Pajek, CitNet, SciMat

• Need to estimate complex statistical (predictive, statistical) models R

• Need maximum software flexibility, integration with software Python

Quick Start

Power Tools


Data Processing

(reorganizing data)(cleaning data)(matching names)


Open Refine

• Spreadsheet/database combination– Ease of use of spreadsheets– Reporting and manipulative power of databases

• Filters, facets, and clustering– Allow granular overview of what’s in your data– Easily see occurrence distribution of values– Easily make global corrections

• Supports both row-level and record-level (multi-row) operationsopenrefine.org


Open Refine – Reorganize DataReorganizing Data• Splitting/joining multi-

valued cells• Transposing rows/columns• Supports logic-based

transformation– Google Refine Expression

Language (GREL)– Clojure– Jython

openrefine.org


Open Refine – Cleaning DataCleaning Data• Duplicate detection• Common data

transformations– Trimming whitespace– Normalizing text case

• Cluster/edit for matching and normalization

Additional Benefits• Perform mass edits

efficiently• Revision history allows for

roll-back to earlier state• Transformations recorded

as JSON– Portable for future data sets

• Browser-based

openrefine.org


Open Refine – Matching NamesMatching names• Create filters to navigate

larger datasets• Create facets to see all

unique values/occurrences• Auto-detect variant entries• Cluster/edit for matching

and normalization• Reconciliation services

against external data for normalization/aggregation

openrefine.org


Name Disambiguation

Methods• Dictionary-based entity

matching• Phonetic Matching• Rules-based linkage• Probability based linking

– Edit distance– Felligi-Sunter algorithm– Machine-learning

Tools• Febrl

sourceforge.net/projects/febrl/

• RecordLinkage (for R)cran.r-project.org/web/packages/RecordLinkage/

• Link-King (for SAS)the-link-king.com

Source: en.wikipedia.org/wiki/Record_linkage

http://sourceforge.net/projects/febrl/



http://cran.r-project.org/web/packages/RecordLinkage/index.html




http://the-link-king.com/




Matching Names – Author Identifiers

What are Author Identifiers?

• Author identifiers give you a way to reliably and unambiguously connect your names(s) with your work throughout your career, including your papers, data, biographical information, etc. This can be helpful in a number of ways:

• Provides a means to distinguish between you and other authors with identical or similar names.

• Links together all of your works even if you have used different names over the course of your career.

• Makes it easy for others (grant funders, other researchers etc.) to find your research output.

• Ensures that your work is clearly attributed to you.

Getting started with ORCID...

• ORCID (Open Researcher and Contributor ID) is a non-prorietary, non-profit community-based registry of research identifiers.

• Links authors to their datasets and other works in addition to articles.

• Authors can control what information in their ORCID profile they share. Only the ORCID ID is automatically shared. (See their privacy policy.)

• It is easy to import research output from other sources (including ResearcherID, Scopus Author ID, and Datacite Metadata Store to your ORCID profile. (See ORCID's import works page.)

• Many organizations and publishers have created integrations with ORCID including Nature Publishing Group, Elsevier, and the American Physical Society.

• Free, private, 30-second registration:orcid.org/register

libguides.mit.edu/content.php?pid=573578&sid=4729602

https://orcid.org/register



http://libguides.mit.edu/content.php?pid=573578&sid=4729602




Application

(Combining External and Internal Sources)

(Co-authorship Analysis)(Visualization)


Citation analysis – export citationsQuestion: For a given paper’s citing articles, what other articles were frequently cited?


Citation analysis – Open Refine






Resources

(Readings)(Software)(Data)

(Glossary)

Recommended Reading• Data Processing - General

– Getting Started:programminghistorian.org/lessons/cleaning-data-with-openrefine

– References:Verborgh, Ruben, and Max De Wilde. Using OpenRefine. Packt Publishing Ltd, 2013.

– Tutorials: github.com/OpenRefine/OpenRefine/wiki/External-Resources

• Data Processing – Dealing with Names– Getting Started -- author identifiers guide:

libguides.mit.edu/content.php?pid=573578&sid=4729602

– References:Winkler 2012; Name Matching and Record Linkages, U.S.

Censushttp://www.census.gov/srd/papers/pdf/rr93-8.pdf


http://programminghistorian.org/lessons/cleaning-data-with-openrefine



https://github.com/OpenRefine/OpenRefine/wiki/External-Resources




http://www.census.gov/srd/papers/pdf/rr93-8.pdf

http://www.census.gov/srd/papers/pdf/rr93-8.pdf

Recommended Reading (Continued)

• Bibliometric Analysis– Tutorials:

Anne-Wil Harzing ,2011 The Publish or Perish Book, part 3: Doing bibliometric research with Google Scholar, Tarma software press

Wouter De Nooy , et al.,2011, Exploratory Social Network Analysis with Pajek, 2nd Edition, Cambridge University Press

author identifiers guide: libguides.mit.edu/content.php?pid=573578&sid=4729602

article level metrics:sparc.arl.org/sites/default/files/sparc-alm-primer.pdf

– References:Eric D. Kolaczyk, 2009, Statistical Analysis of Network Data: Methods and Models, Springer.




Available Databases & API’s

• Scholarly APIs:libguides.mit.edu/apis

• Google Scholar:scholar.google.com

• Scopus:scopus.com

• Web of science: admin-apps.webofknowledge.com

• Author identifiers: libguides.mit.edu/content.php?pid=573578&sid=4729602

• List of MIT-licensed Databases: owens.mit.edu/sfx_local/az/mit_db • Altmetrics

– PLOS article metrics article-level-metrics.plos.org– Plum Analytics plumanalytics.com– ImpactStory impactstory.org






http://scopus.com/










http://www.plumanalytics.com/

http://impactstory.org/

Additional Selected Tools

• OpenRefine: openrefine.org• Publish or Perish: www.harzing.com/pop.htm

• Scholarometer: scholarometer.indiana.edu

• CitNet citnetexplorer.nl

• Gephi gephi.org

• Sci2 sci2.cns.iu.edu

• Pajek pajek.imfm.si

• Scimat sci2s.ugr.es/scimat/

• R Packages:– tm cran.r-project.org/web/packages/tm/– StatNet statnet.org– CITAN cran.r-project.org/web/packages/CITAN

• Python Packages: – scipy scipy.org – Nltk nltk.org – networkx networkx.github.io– bibtools: www.sebastian-grauwin.com/?page_id=492


http://openrefine.org/


http://scholarometer.indiana.edu/

http://www.citnetexplorer.nl/Home

https://gephi.org/

https://sci2.cns.iu.edu/user/index.php



http://pajek.imfm.si/







http://statnet.org/




http://scypi.org/

http://nltk.org/

http://networkx.github.io/





Glossary of Metrics• Author H-Index:

The maximum number of articles h such that each has received at least h citations

• CentralityA measure of the importance of some node in the network based on a selected abstract model of influence/flow across network. Centrality measures include degree centrality (number of connections); closeness centrality (distance of node to other nodes in network); betweenness centrality (proportion of information that must pass through the node to go from one part of the network to another)

• (ISI Journal) Impact Factor:The frequency with which the “average article” has been cited in a particular year. It is based on the most recent two years of citations. It is only supplied for journals indexed by ISI in the Web of Science.

• Clustering:Method that partition n observations into k clusters based on the characteristics of the object. Clusters are defined either by a set of heuristics for forming the cluster, or according to a solution concept that the clusters will satisfy.

One common algorithm, K-Means assigns each observation to a fixed-K number of clusters such that each observation belongs to the cluster that has a mean value closest to that of the observation

• Network community structure measures:The detection of highly-interconnected groups of nodes within a network. Methods include hierarchical-clustering; information maximization; modularity; clique-detection

• Network Diameter:The greatest distance between any two nodes in the network.

• Page Rank:a family of iteratively-calculated recursive impact factors in which citations from other journals are weighted by the impact of those journals



Questions?E-mail: [email protected]:informatics.mit.edu

mailto:[email protected]

http://informatics.mit.edu/

Overview of Bibliometrics - IAP Course

Education

bibliometric analysis

citation count

data journal coverage

data processing

data science journal

data synthesis task

research consulting

opinions of mit