Top Banner
IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin Antelman Associate Director for Information Technology, NCSU Libraries Nisa Bakkalbasi General Science Librarian, Yale Library XXV Charleston Conference, Nov. 4, 2005
29

IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Dec 26, 2015

Download

Documents

Clifton Hunt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

IDENTIFYING OPEN ACCESS ARTICLES:

VALID AND INVALID METHODS

David Goodman Palmer School of Library and Information Science, Long Island University

Kristin AntelmanAssociate Director for Information Technology, NCSU Libraries

Nisa Bakkalbasi General Science Librarian, Yale Library

XXV Charleston Conference, Nov. 4, 2005

Page 2: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 2

We gratefully acknowledge the courtesy of ISI to Stevan Harnad in supplying their citation data

the courtesy of Stevan Harnad in supplying his analyzed data to us

his acceptance of our offer to carry out a manual evaluation

the assistance of Chawki Hajjem and Stevan Harnad in explaining the details of their methodology, and

their helpful comments on our measurements

Page 3: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 3

Why do we want to identify OA?

So users can find it (findability)

So people can link to it (linkability)

To measure % of articles OA To measure OA Advantage (OAA)

(increase in citations by being published OA)

Page 4: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 4

Specific Fields

Lawrence 2001 OA of conference proceedings in electrical engineering and computer science

Location by Research Index using Google

Matched pairs

Kurtz 2003-5 Astronomy papers in ADS(Astrophysics Data System)

Page 5: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 5

Specific Journals

Wren 2005 References to articles in selected medical journals

Other individual journal studies...

Page 6: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 6

Multiple fields

Antelman, 2004 selected subject areas in many academic fields

manual identification OAA = 15% - 40%, depending on subject

Page 7: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 7

Multiple fields

Brody, Harnad et al, 2004-5 selected subject areas in science

automated identification in arXiv by algorithm

automated citation check in WoS OAA = up to 300%, depending on subject

Page 8: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 8

Multiple fields

Brody, Stamerjohanns, Harnad et al, 2004-5 selected subject areas in social science

automated identification in arXiv or web by algorithm

automated citation check in WoS OAA = up to 200%, depending on subject

Page 9: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 9

Multiple fields

Hajjem, Harnad et al., 2005- all subject areas in science & social science

automated identification on web by algorithm

automated citation check in WoS OAA = up to 200%, depending on subject

(This data has been posted by Hajjem at Soton, but is still unpublished)

Page 10: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 10

Our Purpose

to confirm validity of algorithmic OA/non-OA determinations,

to verify measurements of OA and OAA

Page 11: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 11

Our Technique

Selected years and subject fields fromHajjem, Harnad, et al., with OA determination and citations

Sample from algorithm's OA and non-OA Manual check in the web to either confirm OA or not find OA(Google, author's sites, etc.)

Tabulation of ISI citations to determine OA Advantage

(more complete details forthcoming)

Page 12: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 12

OA articles include

Published authentic text in OA Journals

Posted authentic text--publisher's PDF

Posted author's corrected manuscript

Page 13: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 13

Dubious OA items include:

Embargoed published articles, after embargo ends

Embargoed author's manuscripts, after embargo ends

Editorials, Letters, Review articles

Abstract-only publication

Page 14: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 14

Non-OA articles include:

Published authentic text in subscription journals

Abstracts on publisher's site Listing in

title pages alerting services blogs course notes references in other articles links from a posting to publisher's site

Page 15: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 15

Set One Examined

Articles from year 2002(Classical) Biology (ISI category)

1% sample

Page 16: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 16

Decision Table (biology)

Manual detection

OA non-OA TOTAL

Algorithm

detection

OA 106 160 266non-OA 32 239 271

TOTAL 138 399 537

Page 17: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 17

Interpretation (biology):

• Of the 266 items labeled "OA" by the algorithm

only 106 were actually OA

• Of the 138 items actually OA,

the algorithm missed 32

• Of the total 537 items,

the algorithm got 345 right, and 192 wrong.

Page 18: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 18

Signal theory Distributions:

Page 19: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 19

Set Two Examined

Articles from year 2000Sociology (ISI category)

8% sample

Page 20: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 20

Decision Table (sociology)

Manual detection

OA non-OA TOTAL

Algorithm

detection

OA 29 148 177non-OA 25 151 176

TOTAL 54 299 353

Page 21: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 21

Interpretation (sociology):

• Of the 177 items labeled "OA" by the algorithm only 29 were actually OA

• Of the 54 items actually OA, the algorithm missed 25

• Of the total 353 items, the algorithm got 180 right, and 173 wrong.

Page 22: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 22

Signal theory Distributions:

Page 23: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 23

Our Determinations:

Comparison: note that the apparent similarity is due to compensating errors: There are many more non-OA articles than OA, so the small error on missed OA cancels out the big error on over-coded OA.

Biology 2002 %OA

Sociology 2000

%OA

Biology 2002 OAA

Algorithm's value

14% 23% 51%

Our Value 16% 15% 63%

(Sociology OAA not measurable due to error in matching ISI data)

Page 24: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 24

All Determinations (including ours): Unavoidable Systematic Errors

Problematic material types Delayed OA articles Articles posted long after publication Variation in titles Different publications with same titles

Articles removed from web Invisibility to the search engines Errors due to ISI inaccuracy

Page 25: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 25

All Determinations based on Soton data (including ours): Source of possible confusion

OA Journals consistently omitted(all journals with 100% OA)*

Journals without OA consistently omitted(all journals with 0% OA)

* thus, all his OA and OAA determinations are for "Green" self-archiving only, not including "Gold" OA Journals

Page 26: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 26

At least some Soton Data: other known possible sources of confusion or error

All journals given equal weight regardless of size

Data averaged by journal Google not usually used in search Just arXiv used in some searches(whether or not appropriate)

Inadequate testing of algorithms

Page 27: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 27

Determinations of % OA and OAA

Depend on the accuracy of identificationof individual items

Therefore, algorithmic determinations at best accurate only by accident

Page 28: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 28

Conclusions:

I. Accuracy is now still only possiblea. with manual determinations (which are too

tedious for practical use) orb. with well-defined searches in well-defined

fields (such as particular journals or repositories)

II. Generalized algorithmic search engines ofacceptable accuracy have yet to be developed (and tested)

Page 29: IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin.

Nov. 4, 2005 Charleston 29