© Tefko Saracevic 11 BIBLIOMETRICS Tefko Saracevic Rutgers University http://www.scils.rutgers.edu/~tefk o
© Tefko Saracevic 11
BIBLIOMETRICS
Tefko SaracevicRutgers Universityhttp://www.scils.rutgers.edu/~tefko
© Tefko Saracevic 2
What is?
“… all studies which seek to quantify processes of written communication.”
Pritchard
“… the quantitative treatment of the propertiesd of recorded discourse and behavior pertaining to it.”
Fairthorne
Recorded communication - ‘literature’->quantitative methods
© Tefko Saracevic 3
Alan Pritchard 1969
Coined the term "bibliometrics""the application of mathematics and
statistical methods to books and other media of communication“
Journal of Documentation (1969) 25(4):348-349
© Tefko Saracevic 4
and other related metrics …
Also used to study broader than books, articles …Scientometrics
covering science in general, not just publications
Infometrics all information objects
Webmetrics or cybermetrics web connections, manifestations using bibliometric techniques to study the
relationship or properties of different sites on the web
© Tefko Saracevic 5
Concepts
Basic (primitive) concepts:1. Subject2. Recorded communication ->
document, information object3. Subject literatureBibliometrics related to:
science of sciencesociology of science - numerical methods
© Tefko Saracevic 6
Literature studies
Qualitativeoften in humanities, librarianship
Quantitativebibliometrics
Mixed
© Tefko Saracevic 7
Reasons for quantitative studies of literature
Analysis of structure and dynamicssearch for regularities - predictions
possibleUnderstanding of patterns
“order out of documentary chaos”verification of models, assumptions
Rationale for policies & design
© Tefko Saracevic 8
Why quantitative studies?
Qualitative methods often depend on assertions. ‘authoritative’ statements, anecdotal evidence
Science searches for regularitiesSuccess of statistical methods in social
sciencesNeed for justification & basis for decisionsSomething can be counted - irresistible
© Tefko Saracevic 9
Application in ...
History of scienceSociology of scienceScience policy; resource allocationLibrary selection, weeding, policiesInformation organizationInformation management
utilization
© Tefko Saracevic 10
Historical note
Bibliometrics long precedes information science
But found intellectual home in information sciencestudy of a basic phenomenon - literature
It is not ‘hot’ lately, but still produces very interesting results
Branched out into web studies (web is a “literature” as well)
© Tefko Saracevic 11
What studied?
Governed by data available in documents or information resources in general - that what can be countedauthor(s)origin
organization, country, language
source journal, publisher, patent …
© Tefko Saracevic 12
what … more
contents text, parts of text, subject, classes
representationcitations
to a document, in a document, co-citationutilization
circulation, various useslinksany other quantifiable attribute
© Tefko Saracevic 13
Tools
Science Citation IndexCompilation of variables from
journals in a subjectUse dataPublication counts from indexes, or
other data basesWeb structures, links
© Tefko Saracevic 14
Variable: authors
number in a subject, field, institution, countrygrowth correlation with indicators like GNP, energy etc.productivity e.g. Lotka’s lawcollaboration - co-authorship, associated networksdynamics - productive life, transcience, epidemicspapers/author in a subjectmapping
© Tefko Saracevic 15
Variable: origin
Rates of production, size, growth bycountry, institution, language, subject
Comparison between theseCorrelation with economic & other
indicators
© Tefko Saracevic 16
Variable: sources
Concentration most often on journalsGrowth, dynamics, numbers
information explosion - exponential lawstime movements, life cycles
Scatter - quantity/yield distributionBradford’s law
Various distributions by subject, language, country
© Tefko Saracevic 17
Variable: contents
Analysis of textsdistribution of words – Zipf’s lawwords, phrases in various partssubject analysis, classificationco-word analysis
© Tefko Saracevic 18
Variable: representation
frequency of use of index terms, classesdistribution laws - key terms where?thesaurus structure
© Tefko Saracevic 19
Variable: citations
Studied a lot; many pragmatic resultsbase for citation indexes, web of science,
impact factors, co-citation studies etcDerived:
number of references in articlesnumber of citations to articles
research front; citation classics
bibliographic coup[ling
© Tefko Saracevic 20
citations … more
co-citations author connections, subject structure,
networks, maps
centrality of authors, papers
validation with qualitative methodsimpact
© Tefko Saracevic 21
Variable: utilization
frequencydistribution of requests for sources,
titles e.g. 20/80 law
relevance judgement distributionscirculation patternsuse patterns
© Tefko Saracevic 22
Variable: links
Development of link-based metricsin-links, out-links
Web structureWeb page depth; updatePageRank vs quality
© Tefko Saracevic 23
Examples from classic studies
Comparative publications over centuriesNumber of journals founded over timeNumber of abstracts published over
timeNational share of abstracts in chemistryNational scientific size vs. economy sizeBibliographic coupling and co-citationWeb structures, links
© Tefko Saracevic 24
Examples of laws & methods
Lotka’s lawBradford’s lawZipf’s lawImpact factorCitation structuresCo-citation structures
© Tefko Saracevic 25
Alfred J. Lotka 1926
Statistics—the frequency distribution of scientific productivity
Purpose: to "determine, if possible, the part which men of different calibre contribute to the progress of science“Looked at Chemical Abstracts Index, then
Geschichtstafeln der Physik J. Washington Acad. Sci. 16:317-325
© Tefko Saracevic 26
Lotka’s law: xn • y = C
The total number of authors y in a given subject, each producing x publications, is inversely proportional to some exponential function n of x.
Where: x = number of publications y = no. of authors credited with x
publications n = constant (equals 2 for scientific
subjects) C = constant
inverse square law of scientific productivity
© Tefko Saracevic 27
1 publ. 2 publ. 3 publ. 4 publ.
Lotka's Law - scientific publications
xn • y = C
No
. of
auth
ors
© Tefko Saracevic 28
Samuel Clement Bradford 1934, 1948
Distribution of quantity vs yield of sources of information on specific subjects he studied journals as sources, but applicable to other what journals produce how many articles in a subject
and how are they distributed? or How are articles in a subject scattered across journals?
Purpose: to develop a method for identification of the most productive journals in a subject & deal with what he called “documentary chaos”
First published in: Engineering (1934) 137:85-86, then in his book Documentation, (1948)
© Tefko Saracevic 29
Bradford’s law
"If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3 …"
© Tefko Saracevic 30
Bradford's Law of Scattering – an idealized example
No. of source journals
121224
10755
No. of articles per source
60353025986543
Total no. of articles
60703050183260352015
9
27
130
130
1303
© Tefko Saracevic 31
Bradford's Law of Scattering – zones
3 sources 130 articles
9 sources 9 sources 130 articles130 articles
27 sources 27 sources 130 articles130 articles Garfield hypothesis
nucleus
© Tefko Saracevic 32
George Kingsley Zipf 1935, 1949
The psycho-biology of language: an introduction to dynamic philology (1935)
Human behavior and the principle of least effort: An introduction to human ecology (1949)
Looked, among others, at frequency distributions of words in given textscounted distribution in James Joyces’ Ulysses
Provided an explanation as to why the found distributions happen:
Principle of least effort
© Tefko Saracevic 33
Zipf’s law: r • f = c
Where:r = rank (in terms of frequency)f = frequency (no. of times the given word is used in the text)c = constant for the given text
For a given text the rank of a word multiplied by the frequency is a constant
Works well for high frequency words, not so well for low – thus a number of modifications
© Tefko Saracevic 34
Charles F. Gosnell 1944 Obsolescence
He studied obsolescence of books in academic libraries via their use
• College Res. Libr. (1994) 5:115-125
But this was extended to study of articles via citations, and other sources
Age of citations in articles in a subject:half life – half of the citations are x year old etc
different subjects have very different half-lives
© Tefko Saracevic 35
Curve of obsolescence
Nu
mb
er o
f u
sers
Age at time of use
© Tefko Saracevic 36
Eugene Garfield 1955
Focused on scientific & scholarly communication based on citations
• Science (1995) 122:108-111
Founded Institute for Scientific Information (ISI)major proeduct now ISI Web of Knowledge
Impact factor for journals, based on how much is a journal cited
Mapping of a literature in a subjectCitation indexes/web of knowledge
MAJOR resources in bibliometric studies
© Tefko Saracevic 37
citedarticle
Citation matrix
citedarticle
citedarticle
article
citingarticle
citingarticle
citingarticle
citingarticle
citingarticle
citingarticle
citingarticle
© Tefko Saracevic 38
citedarticle
Science Citation Index
citedarticle
citedarticle
article
citingarticle
citingarticle
citingarticle
citingarticle
citingarticle
citingarticle
citingarticle
Association-of-ideas index
© Tefko Saracevic 39
Co-citation analysis
Articles that cite the same article are likely to both be of interest to the reader of the cited article
article
citingarticle
citingarticle
These two articles are likely to be related
© Tefko Saracevic 40
Impact factor (IF)
number of citations received in current year by papers published in the journal in the previous two yearsdivided by
number of papers published in the journal in the previous two years
IF has become over time a crucial indicator of journal quality andgiven ISI a monopoly position in the evaluation of
journal qualityReported in Journal Citation Reports (1976-)
© Tefko Saracevic 41
Garfield’s HistCite
“Bibiliographic Analysis and Visualization Software”
Provides citation statistics & graphs for people, journals, institutions …various citations scores, no. of cited references
in articles … various graphs with connections
Example: articles and authors for JASIST (and predecessor names) for 1956-2004includes citations to authors
© Tefko Saracevic 42
Conclusion
Bibliometrics, & related scientometrics, infometrics, webmetrics provide insight into a number of properties of information objectssome general, predictive “laws”
formulatedstructures have been exposed, graphedmyriad data collected & analyzed
A good area for research!
© Tefko Saracevic 43
Sources used in making this presentation– among others
Ruth Palmquist BibliometricsDonna Bair-Mundy Boolean, bibliometrics, Boolean, bibliometrics,
and beyondand beyond Short set of bibliometric exercises by J. DownieShort set of bibliometric exercises by J. Downie
http://people.lis.uiuc.edu/~jdownie/http://people.lis.uiuc.edu/~jdownie/biblio/biblio/