28 April 2004 Second Nordic Conference on Scholarly Communicatio n 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia Group University of Southampton
Mar 27, 2015
28 April 2004 Second Nordic Conference on Scholarly Communication
1
Citation Analysis for the Free, Online Literature
Tim Brody
Intelligence, Agents, Multimedia Group
University of Southampton
28 April 2004 Second Nordic Conference on Scholarly Communication
2
Content
• Current services for Open Access Literature
• Institutional Archives Registry• Metadata Harvesting through Celestial• Citebase Search
– Citation Linking– Search and Navigation Service
• Web Impact as a predictor of Citation Impact
28 April 2004 Second Nordic Conference on Scholarly Communication
3
Institutional Archives Registry
28 April 2004 Second Nordic Conference on Scholarly Communication
4
28 April 2004 Second Nordic Conference on Scholarly Communication
5
Sites in the IAR
• Things we want to know:– GNU EPrints sites– Other research collections (Other Archives, Open
Journals)– BOAI 1. vs BOAI 2.
• A submission form consisting of:– URL, Name, OAI URL, Country, ‘type’, full-text,
software• Can’t (yet) track full-texts• (Create a master-list so archives only register-
once?)
28 April 2004 Second Nordic Conference on Scholarly Communication
6
Celestial
• Designed to:– Be an abstraction over OAI-PMH versions– Caching OAI metadata records
• Technological questions:– How big can the OAI-PMH go (ok for 5 million
records so far)– How reliable are OAI-PMH implementations
• Feeds Citebase, IAR, some external users
28 April 2004 Second Nordic Conference on Scholarly Communication
7
28 April 2004 Second Nordic Conference on Scholarly Communication
8
28 April 2004 Second Nordic Conference on Scholarly Communication
9
Services for Open Access Literature
Self-Archived Full-texts (Pre/Post-prints)Open Access Publishing
Citation Analysis/Linking Services(Citebase / Citeseer / OpenURL / DOI)
Version Linking Services
Search EnginesNavigation Tools
Analysis & Assessment
Citebase
Citeseer
BM
C
arXiv.org
OA
I-PM
H T
ransport
OA
Ister
Scirus
n.b. Scirus/OAIster aren’t citation-analysis aware yet, Googleindexes Citeseer. Not an exhaustive list …
28 April 2004 Second Nordic Conference on Scholarly Communication
10
Citation Analysis & Linking
• A citation is a reference from one work to another [as a hyperlink: a citation link]
• Citation analysis uses citation relationships to analyse patterns in research
• As a graph a work (paper, book etc.) is a vertex and a citation an edge
• ‘Bibliometrics’– (study of patterns in literature)
28 April 2004 Second Nordic Conference on Scholarly Communication
11
Digitometric/Infometric Analysis
• Bibliometrics for the online age
• Couple citation analysis with Web analysis– (how many times has x been accessed?)
• Similar to readership studies, but easier to survey and more comprehensive– (though subject to the same problems of
copies being re-distributed, multiple accesses etc.)
28 April 2004 Second Nordic Conference on Scholarly Communication
12
Citebase Search
Repositories
Metadata Harvest(OAI-PMH)
Full-text Harvest
Meta Database
ReferencesDatabase
CitationDatabase
WebInterface
OAI-PMHInterface
Citebase
28 April 2004 Second Nordic Conference on Scholarly Communication
13
Citation Linking
• Retrieve and cache full-texts– LaTeX, PDF, XML
• Extract reference list
• Extract individual references
• Parse references into components– Author, year, title, journal, volume, pagination
• Store in structured database
28 April 2004 Second Nordic Conference on Scholarly Communication
14
Citebase Search
28 April 2004 Second Nordic Conference on Scholarly Communication
15
28 April 2004 Second Nordic Conference on Scholarly Communication
16
Citebase Search:Navigation by Citation Links
Current Article Co-cited
Article withreference list
Referencelink
Future
Past
Related
28 April 2004 Second Nordic Conference on Scholarly Communication
17
28 April 2004 Second Nordic Conference on Scholarly Communication
18
Predicting Citation Impact
• The Web gives us access to new metrics– Download/access frequency
• Can early-day ‘download’ frequency give an indication of longer-term citation frequency?
• (Web logs from the UK arXiv.org mirror, Citation data from Citebase Search)
• Pearson correlation after 6 months of web logs = 0.42 for the High Energy Physics sub-arXiv
28 April 2004 Second Nordic Conference on Scholarly Communication
19
28 April 2004 Second Nordic Conference on Scholarly Communication
20
28 April 2004 Second Nordic Conference on Scholarly Communication
21
28 April 2004 Second Nordic Conference on Scholarly Communication
22
28 April 2004 Second Nordic Conference on Scholarly Communication
23
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 100 200 300 400 500 600 700 800Days since deposit
Cor
rela
tion
(r)
28 April 2004 Second Nordic Conference on Scholarly Communication
24
Assessing Research(ers)
• Citation Impact– By-Paper, Author, [Journal, Institution]
• Web Impact– Predictor of citation-impact, combine with
citation-impact
• Search Engines
• More detailed research assessment
28 April 2004 Second Nordic Conference on Scholarly Communication
25
Comparing Online/Offline Impact
• Using ISI CD-ROM data• Use Web crawlers to find ‘online’ articles• Compare citation impact of online and
offline articles– By discipline, by journal, by author?
• Initial results for Physics show 2-3x increase– arXiv.org
• Southampton, U. Quebec, Oldenburg (de)
28 April 2004 Second Nordic Conference on Scholarly Communication
26
Relevant Web Pages
• EPrints – http://www.eprints.org/– IAR: http://archives.eprints.org/
• Citebase Search– http://citebase.eprints.org/
• Celestial– http://celestial.eprints.org/
• Correlation Generator– http://citebase.eprints.org/analysis/correlation.php
• Tim Brody <[email protected]>