1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing & Informatics Director, Metadata Research Center
Dec 21, 2015
1 11-20-14/Greenberg
Metadata Quality and Capital
Disseminators and Service ProvidersNovember 20, 2014
Jane GreenbergProfessor, College of Computing & InformaticsDirector, Metadata Research Center
2 11-20-14/Greenberg
Your data is only as good
as your metadataMetadata is a first class object
Toothbrush
4 11-20-14/Greenberg
The topic…
Good enough is not bad (DRYAD)(DRYAD)
ROI – return on investment (CAPITAL)(CAPITAL)
RDA – Research Data Alliance (COMMUNITY)…. time permitting(COMMUNITY)…. time permitting
6 11-20-14/Greenberg
8 11-20-14/Greenberg
Pre-populated metadatafield
9 11-20-14/Greenberg
10 11-20-14/Greenberg
Data downloads reuse citation
Observations, motivating study of metadata capital1.Metadata generation costs money
2.Metadata reuse is a BIG a BIG part part of Dryad’s workflow3.Metadata reuse via OAI4.Metadata reuse via data sharing, reuse, and repurposing
Download 10678 times
Journal Re.Wrkfl
Blackout
AmNtrl N NMBE N NBioRisk Y NBMJ Open
Y N
…. Y
Type Total 30 days
Data packages 6781 198
Data files 20832 957
Journals 361 72
Authors 24166 3312
Downloads 635348 37611
• Journals (80+…PLOS): http://datadryad.org/pages/integratedJournals
• X >10GB = $15,$10+
12 11-20-14/Greenberg
TechnologyDSpace DOIs via CDL/DataCiteCC0 (<m> + data)Integration with specialized repositories and databasesFederated searching with TreeBASE and KNB LTERTreeBASE submission (OAI-PMH)GenBank (currently in development)
Governance““non-profit status, 12 non-profit status, 12 member Board of Directors”member Board of Directors”
Sets policy, goals•science, journals, societies, OCLC, MS
2006 Dryad development – NESCent +<MRC>•Stakeholders: journals, publishers and scientific societies, and researchers.
2009-2012: Interim Board
$ PAYMENT-Sept. 1,2014
13 11-20-14/Greenberg
14 11-20-14/Greenberg
Singapore Framework
Dryad DCAP, ver. 3.0bibo (The Bibliographic Ontology)dcterms (Dublin Core terms)dryad (Dryad) DwC (Darwin Core)
Vision1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric2.Interoperable: harvesting, cross-system searching 3.Semantic Web compatible: sustainable; supporting machine processing
Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/19386380903405090.
15 11-20-14/Greenberg
Metadata research & developmentMetadata research & development1.Curation workflow - cognitive walkthroughs2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010)3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012)5.Name-authority control - exploratory study (Haven, 2009, INLS 720)6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM)7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) 8.Vocabulary needs (HIVE) (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib)9.Metadata theory – deductive analysis (Greenberg, 2009)
Interoperability slope
Semantic ontologies
Researcher names
Agency/institution
17 11-20-14/Greenberg
Package metadata harvested from email
Subj. 177 (gr. 97%, rd. 2%, bl. 1%)
Contr. 101 (gr. 99%, bl. 1%)
20 11-20-14/Greenberg
The leap - capital to metadata capital
An economic concept (Weber, 1905; Smith’s, 1776) • Business and operations (net gains or losses)• Finances, goods and services, and public needs• Intellectual capital, social capital• a tangible result, value increase
Metadata as an asset, a product • Reuse of good quality metadata increase
value of initial investment• Poor quality may reduce metadata capital ?
• Metadata reuse prevalence • Cooperative cataloging , CIP, ISBD, MARC, FRBR,
LCC, VIAF, OAI-PMH, CrossRef, PubMed, Zotero, BibTex, DataCite. Linked data/Semantic Web, PIDs, etc.
Modified Capital-sigma notation
Reuse
nR + ∑ ai = R + a1 + a2 +a3 + …an
i=1R = value of the metadata recordi= number of usagesa = incremental increase in valuen = maximum number of reuse
22 11-20-14/Greenberg
Author/Submitter | Curator
100 metadata instantiations•8 of 12 metadata properties had reuse @ 50% or greater•5 of 8 confirmed reuse at• 80% or higher. •Basic bib. vs. complex
Author
Subject
Dcterms.spatial
DwC.ScientificName
Modified Capital-sigma notation for linked data linked data
Reuse of linked data concept/URI
P = Determined by the number of terms in an ontology, labor hours to generate, integrate, etc,
25
Helping Interdisciplinary Vocabulary Engineering (HIVE)HIVE)
C V cost, interoperability, and usability constraintsC V cost, interoperability, and usability constraints Linked Open Vocabulary initiative, to support inter/transdisciplinary…. SKOS (a little dumb) AMG + machine learning approach for integrating discipline terminologies
27 11-20-14/Greenberg
~~~~Amy~~~~Amy
Meet Amy Zanne. She is a botanist.
Like every good scientist, she publishes, and she deposits data in Dryad.
Amy’s dataAmy’s data
28 11-20-14/Greenberg
29 11-20-14/Greenberg
Successive growth rates
N∑ ic = Θ (nc +1) i=1
Cycles…
What about successive growth rate tied to a concept? A concept can be
• in ~ vernacular to canonical• fall by the wayside, less popular• out (deprecated)
30 11-20-14/Greenberg
Conclusion…other Valuation Approaches
Market cap of Facebook per user: $40 – $300 Revenues per record per user: $4 – $7 per year
• Facebook• Experian
Market prices of personal data:
• $0.50 for street address• $2.00 for date of birth• $8 for social security number• $3 for driver’s license number• $35 for military record
SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.
Concluding remarks
Interest….traction Limitations: bad data,
cost/value We should care about
cost Metadata capital can
contextualize Generic formula for
further research
32 11-20-14/Greenberg
Metadata Standards Directory Working Group….
Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela
33 11-20-14/Greenberg
“…develop a collaborative, open directory of metadata standards applicable to scientific data”Stakeholders: Researchers, data managers, data scientists, tool developers, repositories, agencies, societies (RDA’s growing community)
Goals and workplan - DCC Disciplinary Directory: http://www.dcc.ac.uk/resources/metadata-standards
34 11-20-14/Greenberg
Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp,
Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)
**Drexel/UNC <Metadata Research Center>: Jose R. Pérez-Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary
U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research
Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and
many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton
35 11-20-14/Greenberg
http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki
http://code.google.com/p/[email protected]
Facebook: Dryad Twitter: @datadryad
http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/
Metsdata Reserch Center: http://cci.drexel.edu/mrc
http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki
http://code.google.com/p/[email protected]
Facebook: Dryad Twitter: @datadryad
http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/
Metsdata Reserch Center: http://cci.drexel.edu/mrc
36 11-20-14/Greenberg
Sustainability: Plan Comparison
Payment Plan Member Non-member Minimum purchase
1. Voucher Plan USD$65 per data package
USD$70 per data package 25 vouchers
2. Deferred Payment Plan
USD$70 per data package
USD$75 per data package 1 yr contract
3. Subscription Plan
Annual fee based on USD$25 per published research article
Annual fee based on USD$30 per published research article
2 yr contract
For individuals:Pay on acceptance NA
USD$80 per data package, payable by the submitter
1 data package
37 11-20-14/Greenberg
More on grown and sustainability Membership: http://datadryad.org/pages/
membershipOverview Pricing and sponsorship of
deposits: http://datadryad.org/pages/pricing
Journal integration: http://datadryad.org/pages/
journalIntegration