Top Banner
Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international physics community. Professor Stevan Harnad, University of Southampton Dr. Les Carr, University of Southampton Tim Brody, Ian Hickman Open Citation Project - http://opcit.eprints.org/ e-print e-mbryology
19

Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Mar 27, 2015

Download

Documents

Bryan Townsend
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international physics community.

Professor Stevan Harnad, University of SouthamptonDr. Les Carr, University of Southampton

Tim Brody, Ian Hickman

Open Citation Project - http://opcit.eprints.org/

e-print e-mbryology

Page 2: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Growth of the LANL Archive

• Since the start of the archive in 1991 its usage has been steadily growing

• Now, after 10 years, it has over 130,000 papers

Deposit Frequency

0

500

1000

1500

2000

2500

3000

1991

07

1992

01

1992

07

1993

01

1993

07

1994

01

1994

07

1995

01

1995

07

1996

01

1996

07

1997

01

1997

07

1998

01

1998

07

1999

01

1999

07

2000

01

Month

Dep

osit

Fre

quen

cy

Page 3: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

LANL Authors

• Number of unique names identified in each year

• *Before 1995 author meta-data is missing in most sub-fields

1991* 4111992* 11521993* 14391994* 59581995 151981996 177621997 223591998 277851999 326732000-06 19593

Number of Identified Authors per Year

Page 4: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Citation Identification

• Based on automatic extraction and identification from the document source (Adobe Acrobat - .pdf)

• We have defined terminology for two types of citation:– “Red-Link”, an author cites a LANL pre-print article using a LANL

reference (e.g. hep-th/0006010)

– “Orange-Link”, an author cites a post-print, published article that is also deposited in the archive (e.g. Phys. Rev. D56 6588 (1997))

• Identifies (Red + Orange) 600,000 citations of 3,000,000 total citations from 130,000 papers

% (red+orange) % all (3083763)Red 259437 40.02% 8.41%Orange 388904 59.98% 12.61%Total 648341 100.00% 21.02%

Page 5: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.
Page 6: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Identification Ratio

• Currently 25% of citations are being identified

Citation Validation - Identification Ratio Against Time

0

0.05

0.1

0.15

0.2

0.25

0.3

1991

08

1992

02

1992

08

1993

02

1993

08

1994

02

1994

08

1995

02

1995

08

1996

02

1996

08

1997

02

1997

08

1998

02

1998

08

1999

02

1999

08

2000

02

Month

Iden

tifie

d C

itatio

ns R

atio

Red Links/Total Citations Orange Links/Total Citations Total Identif ied/Total Citations

Orange/Total (Trend) Red/Total (Trend) Identif ied/Total (Trend)

Page 7: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Identification Ratio - hep-th

• Currently 40% of citations from hep-th (High Energy Physics - Theory) papers are directly citing pre-print articles in LANL

Citation Validation - Identification Ratio Against Time (hep-th)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

199108

199202

199208

199302

199308

199402

199408

199502

199508

199602

199608

199702

199708

199802

199808

199903

199909

200003

Month

Identif

ied C

itatio

ns R

atio

Red (121241 - 21.04%) Orange (85202 - 14.79%) Identif ied (206443 - 35.83%)

Page 8: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Citation Latencies

• The raw data show that the latency of the citation peak has been reducing over the period of the archive

Frequency of Citation Latencies: 1992-1999

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 12 24 36 48 60 72 84 96

Time Difference/Months

Cita

tions

99 98 97 96 95 94 93 92

Page 9: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Normalised and Scaled Graph of Citation Latency: 1992-1999

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 12 24 36 48 60 72 84 96

Time Difference/Months

Ref

eren

ces

99 98 97 96 95 94 93 92

Citation Latencies

• Normalised data are corrupted by an artefact in the citation ratios (used to adjust for time)

Page 10: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Updates to LANL Pre-prints

• The LANL archive allows authors to update articles that they have deposited

Multiple Updates by LANL Subfield(based on LANL meta-data)

adap-orgastro-ph

chao-dyncomp-gas

cond-matcs

gr-qchep-ex

hep-lat

mathmath-ph

nlinnucl-ex

nucl-thpatt-sol

physicsquant-ph

solv-int

hep-thhep-ph

0 5000 10000 15000 20000 25000

No. of Papers w ith Updates

No Updates 1 Update 2 Updates 3 Updates 4 Updates

Page 11: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Update Frequency against Time (normalised)(based on moving average over 3 points)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0 7 14 21 28 35 42 49 56 63 70

Time Difference (days)

Fre

quen

cy (

Pap

ers)

1st Update, trend 2nd Update, trend 3rd Update, trend 4th Update, trend

Update Delay

• There are too few values to provide an accurate frequency so a trend must estimated

Page 12: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

hep-th

0

25

50

75

100

125

150

175

20019

9107

1992

01

1992

07

1993

01

1993

07

1994

01

1994

07

1995

01

1995

07

1996

01

1996

07

1997

01

1997

07

1998

01

1998

07

1999

01

1999

07

2000

01

Pap

ers

With J-R With J-R/Report Report Unknow n

Article Embryology

• Papers with a journal reference [J-R] cross papers without a J-R at an age of 13 months, suggesting a time difference of 13 months between pre-print and post-print

Page 13: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Article State by Sub-field

• Self-professed state of the article (hep is updated by SLAC/SPIRES)

Retrospective Paper State by LANL Subfield

adap-orgastro-ph

chao-dyncomp-gas

cond-matcs

gr-qchep-ex

hep-lathep-ph

hep-thmath

math-phnlinnucl-ex

nucl-thpatt-sol

physicsquant-ph

solv-int

0 2500 5000 7500 10000 12500 15000 17500 20000 22500 25000Papers

With J-R With J-R/Report Reports To Appear Submitted Accepted Thesis Other

Page 14: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Does Author Impact effect the state of articles?

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

All (132218) Low (38.66%) Medium (22.23%) High (2.62%)

Author Impact Level (total papers)

Accepted

J.Ref

J.Ref/Report

Report

Review

Submitted

Unknown

State of Cited Articles

• Broken down by papers written by authors with given impact level

• Author impact determined by “Red-Link” citations

Page 15: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Author Deposit Rates

• 50% of deposits of new papers occur within 4 months of the author’s previous paper

Author Frequency of Deposits

0

5000

10000

15000

20000

25000

30000

35000

0 6 12 18 24 30 36 42 48

Time (Months)

Dep

osits

0

22000

44000

66000

88000

110000

132000

154000

Cum

ulat

ive

Real Time Cumulative

Page 16: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Author Impact Analysis• There is a co-author list for each paper

• Author impact is defined as the number of citations an author receives divided by the number of papers that author has deposited (the mean number of citations for an author)

• By applying this to each author, a list of author names with their impact is constructed

• The authors are ranked by their impact

• The set of authors is then divided into three impact sets; lowest 25%, middle 50% and highest 25%.

Authors

Impa

ct High

M ediumLow

Page 17: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Author Impact Quartiles

• High impact authors update more than medium or low

• High and medium impact authors deposit more papers than low

Quartile Total % Total Citations PapersCitations/Aut

hor/PaperDeposits

Mean Updates/Author

High 25% 798 2.09% 240,092 2,732 0.11 6,720 0.48Med 50% 9,262 24.20% 733,272 37,318 0.00212 93,671 0.37Low 25% 28,211 73.71% 251,925 67,951 0.000131 165,971 0.27

Page 18: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Cumulative Paper Frequency, by Author Impact

0

5000

10000

15000

20000

25000

30000

35000

0 20 40 60 80 100 120 140 160 180 200 220 240

Citations

Pap

ers

0

500

1000

1500

2000

2500

3000

Pap

ers

(Hig

h Im

pact

)

Medium (27317 - 44.91%) Low (30981 - 50.93%) High (2533 - 4.17%)

Author’s Papers

• There is no or little occurrence of the single high impact paper for the low impact author

Page 19: Based on the Los Alamos National Laboratory Digital Archive, we present our research into the use of the archive by authors and researchers from the international.

Histogram of Citations per Paper(author impact) 30,000 papers were by authors w ith no citation

1386534 6072 5863

9627

30807

13668 11527

6784

3105

1797121 24925717047814441

2060

0

5000

10000

15000

20000

25000

30000

35000

40000

No citations 1 Citation 2/3 Citations 4/5/6Citations

7/8/9/10Citations

11 or moreCitations

Pap

ers

High (2.53%) Medium (34.55%) Low (62.92%)

Citation Spread

• A small number of papers receive a very large number of citations