Presentation from Chuck Koscher at the 2009 Technical Working Group meeting in Cambridge, MA
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
CrossRef 2009 Annual Member Meeting - Boston Page 1
CrossRef 2009 Annual Member Meeting - Boston Page 3
Deposit times (2009) System status
June July August Sept October Less than 5 mn: 107888 (53 %) 141105 (83 %) 131661 (91 %) 83379 (57 %) 33546 (52 %)
Less than 1 hr: 35189 (17 %) 22389 (13 %) 10753 (7 %) 33829 (23 %) 18165 (28 %)
Less than 6 hr: 31666 (15 %) 3666 (2 %) 903 (0 %) 24201 (16 %) 8037 (12 %)
Less than 12 hr: 23482 (11 %) 181 (0 %) 0 (0 %) 2411 (1 %) 1855 (2 %)
Less than 18 hr: 4019 (1 %) 713 (0 %) 0 (0 %) 968 (0 %) 1950 (3 %)
Less than 24 hr 0 (0 %) 3 (0 %) 0 (0 %) 0 (0 %) 0 (0 %)
More than 24 hr: 0 (0 %) 1 (0 %) 1 (0 %) 1 (0 %) 0(0 %)
Total deposits: 203001 168058 143318 144790 63555
4
CrossRef 2009 Annual Member Meeting - Boston Page 4
System status
Operations changes
Starting to use HAProxy for internal load balancing and redundancy Using Alertra for external monitoring VMWare virtual servers Now migrating Oracle from 9 to 11g (allows active read-only standby) Using Jira for all [email protected] activities Berkeley DB based service for OpenURL DOI queries (metadata lookups)
Testing a process for <unstructured_citations> Two technologies being used
refXpress from Inera which parses a reference and breaks it into parts CitationQueryEngine, internally developed Lucene based search
Trial run Number of unstructured citations : 1,158,889 Number of DOIs processed : 3,150,525 Number of refXPress DOIs found : 47,165 Number of CQE DOIs found (score>2.2) : 139,721
5
CrossRef 2009 Annual Member Meeting - Boston Page 5
<citation key="10.1016/S0736-0266(02)00040-2-BIB21"> <author>Valero-Cuevas</author> <cYear>2000</cYear> <unstructured_citation> Applying principles of robotics to understand the biomechanics, neuromuscular control and clinical rehabilitation of human digits. In: IEEE International Conference on Robotics and Automation, San Francisco, CA, 2000. </unstructured_citation> </citation>
<journal_title> Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology </journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>C</given_name> <surname>Xu </surname> </contributor> </contributors> <volume>1098</volume> <issue>1</issue> <first_page>32</first_page> <last_page>40</last_page> <year media_type="print">1991</year> <publication_type>full_text</publication_type> <article_title> Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids </article_title>
<journal_title> Biochimica et Biophysica Acta (BBA) - Bioenergetics </journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>C</given_name> <surname>Xu </surname> </contributor> </contributors> <volume>1098</volume> <issue>1</issue> <first_page>32</first_page> <last_page>40</last_page> <year media_type="print">1991</year> <publication_type>full_text</publication_type> <article_title> Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids </article_title>
10
CrossRef 2009 Annual Member Meeting - Boston Page 10
<citation key="b53_366"> <unstructured_citation> 53. O.S. Gudmundsson, S.D.S. Jois, D.G. Vander Velde, T.J. Siahaan, B. Wang, and R.T. Borchardt (1999 ) The effect of conformation on the membrane permeability of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides.J. Pept. Res.53 , 383 -392 . </unstructured_citation> </citation>
<doi type="journal_article"> 10.1034/j.1399-3011.1999.00076.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>383</first_page> <last_page>392</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title> The effect of conformation on the membrane permeation of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides </article_title>
<doi type="journal_article"> 10.1034/j.1399-3011.1999.00077.x</doi> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> </contributor> </contributors> <volume>53</volume> <issue>4</issue> <first_page>403</first_page> <last_page>413</last_page> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <article_title> The effect of conformation of the acyloxyalkoxy-based cyclic prodrugs of opioid peptides on their membrane permeability </article_title>
6
11
CrossRef 2009 Annual Member Meeting - Boston Page 11
Changes (problems)
Notable software error this past 12 months
URLs in Handle rewritten with an older value (effected some publishers who had deposited as-crawled URLs AND did URL mods via ownership transfer)
Medium-big changes Book Volume-title/ author/ year rule: match on (only) Book title DOIs (sample2) Added a false positive prevention rule
IF a (XML) query contains article title and that title is not an exact match with the deposited title DO NOT MATCH except if author and first-page are an EXACT match
Small-medium changes Matching special characters in author names Matching compound surnames Removed ability to avoid conflicts DOI character limits: "a-z", "A-Z", "0-9" and "-._;()/“
Title lock-down (ISSN check disallowing a deposit)
12
CrossRef 2009 Annual Member Meeting - Boston Page 12
Issues
Ongoing …..
Too many alternative (publication) titles. Be CAREFULL!!! Can really mess up title fuzzy matching (we do have a schematron monitor)
Deleting DOIs: Change the publication title Change the DOI’s title (article title) to the DOI itself Remove optional metadata Set publication date to the deletion date
Conflicts
Conflicts reduce matching rates!
Timestamps
DOIs are deposited with a timestamp to ensure the latest metadata gets inserted. Timestamps are essential when we have to re-process deposits. Problems occur when DOI ownership occurs (e.g. what is the timestamp?)
Solution: Crossref will provide a means to retrieve current timestamp.
New
13
CrossRef 2009 Annual Member Meeting - Boston Page 13
=========================================== Created: 2006-04-04 04:10:03.0 ConfID: 263262 CauseID: 246648646 OtherID: 64341060, JT: Ophthalmic and Physiological Optics MD: Brown, 15 ,3,163,1995,Differences in visual acuity between the eyes: determination of normal limits in a clinical population DOI: 10.1046/j.1475-1313.1995.9590568m.x(85579-R 263262-null )
Schematron rules Contributor checks Alert if only single author is present (not reported but recorded) Alert if only first initial is deposited Check for numbers in given name / surname Check for punctuation in given name / surname currently checks for: _\/*@()[] Check for ndash in name Check for Jr or Sr in surname Alert if all caps Alert if more than 3 spaces are present Alert if space in surname when no given name is present Alert if surname ends with jr,JR Alert if surname contains 'et. al. Alert if surname/given name contains & or &# (malformed entity) Alert if multiple ??? are present
Page Ranges Alert for _ or - in first or last page Alert if first and last page are identical
Edition / Issue info Check for 'edition' in <edition> Check for 'issue' in <issue> Check for 'no' or 'number' in volume/issue/edition
Citation Checks All surname checks All page range check Year range check
Article Title Check for single word title Alert if all caps Alert if title name contains & or &# (malformed entity)
Other Alert for year beyond current year Alert if neither first page or author are present Alert if more than 2 alternate titles Alert if DOI contains character not in allowed
19
CrossRef 2009 Annual Member Meeting - Boston Page 19
224,000 DOIs with bad page number (really effects matching)
DOI links that still work: 14,985 journals crawled in 2009
69.25% are confirmed good, 22.8% unconfirmed, 5% confirmed not good sum(dois) sum(checked) sum(confirmed) sum(semiconfirmed) sum(nonconfirmed) sum(bad) sum(login) 25,977,348 361,514 206,204 44,168 82,565 1,140 16,950
Western Journal of Medicine 10.1136/ewjm.172.6.364 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.2.84 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.43 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.61-a http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.174.2.103 http://www.pubmedcentral.nih.gov/
20
CrossRef 2009 Annual Member Meeting - Boston Page 20
Metadata Quality Schematron reports are run once a week. From: <[email protected]> Date: October 3, 2009 12:42:27 PM EDT To: <[email protected]>, <[email protected]> Subject: Schematron Report for prefix(es) 10.1109 [email protected] The results of a weekly metadata quality check are listed below. The affected DOIs were deposited successfully but the metadata attached to the DOI may need some attention. http://www.crossref.org/schematron/data/st_20091003_5431.xml http://www.crossref.org/schematron/data/st_20091003_5347.xml http://www.crossref.org/schematron/data/st_20091003_5430.xml http://www.crossref.org/schematron/data/st_20091003_5348.xml http://www.crossref.org/schematron/data/st_20091003_5411.xml
CrossRef 2009 Annual Member Meeting - Boston Page 21
System rewrite May 2008: Board endorses plan to address a significant rewrite/upgrade
June 2008-Feb 2009: TWG subgroup (rewrite2) meets to define requirements and other project parameters Oct: Scenario options documented and cost comparisons profiled, started negotiations with Atypon re: new contract. Nov: Report presented to board and to rewrite2 group for direction and validation
Dec 08- May09: Negotiations with Atypon
Oct 12,09: New contract signed
• That CrossRef should ultimately own the intellectual property in the software at the heart of its operations • That CrossRef should not risk or jeopardize the reliability and throughput offered by the existing system • That CrossRef should remain free to develop further applications for other purposes which need to interface to the reference-linking systems and/or its data • Recognized that CrossRef is not likely to establish internal resources sufficient to manage independently the development and maintenance of this magnitude a system.
Core Needs
22
CrossRef 2009 Annual Member Meeting - Boston Page 22
System rewrite
2009 2010 2011
Existing System (EDS) EDS mods to use NQS
New Query System (NQS) New Deposit System (NDS)
System
Both query and deposit transactions Deposit transactions Query transactions
NQS will make use of the existing Oracle database (minimal mods to the schema)
EDS will communicate with NQS via JMI (Java Message Interface)
May use Spring framework, if not initially more likely later on (NDS)
NDS will include significant data model and process changes
Title management Conflicts Oracle schema cleanup
NQS/NDS combined will allow integration of currently stand-alone functions (OAI-PMH)
After NQS/NDS: possibly augment/replace back end database (satellite DBs)
23
CrossRef 2009 Annual Member Meeting - Boston Page 23