Top Banner
1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing & Informatics Director, Metadata Research Center
37

1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Dec 21, 2015

Download

Documents

Clifton Randall
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

1 11-20-14/Greenberg

Metadata Quality and Capital

Disseminators and Service ProvidersNovember 20, 2014

Jane GreenbergProfessor, College of Computing & InformaticsDirector, Metadata Research Center

Page 2: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

2 11-20-14/Greenberg

Your data is only as good

as your metadataMetadata is a first class object

Page 3: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Toothbrush

Page 4: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

4 11-20-14/Greenberg

The topic…

Good enough is not bad (DRYAD)(DRYAD)

ROI – return on investment (CAPITAL)(CAPITAL)

RDA – Research Data Alliance (COMMUNITY)…. time permitting(COMMUNITY)…. time permitting

Page 5: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 6: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

6 11-20-14/Greenberg

Page 7: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 8: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

8 11-20-14/Greenberg

Pre-populated metadatafield

Page 9: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

9 11-20-14/Greenberg

Page 10: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

10 11-20-14/Greenberg

Data downloads reuse citation

Observations, motivating study of metadata capital1.Metadata generation costs money

2.Metadata reuse is a BIG a BIG part part of Dryad’s workflow3.Metadata reuse via OAI4.Metadata reuse via data sharing, reuse, and repurposing

Download 10678 times

Page 11: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Journal Re.Wrkfl

Blackout

AmNtrl N NMBE N NBioRisk Y NBMJ Open

Y N

…. Y

Type Total 30 days

Data packages 6781 198

Data files 20832 957

Journals 361 72

Authors 24166 3312

Downloads 635348 37611

• Journals (80+…PLOS): http://datadryad.org/pages/integratedJournals

• X >10GB = $15,$10+

Page 12: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

12 11-20-14/Greenberg

TechnologyDSpace DOIs via CDL/DataCiteCC0 (<m> + data)Integration with specialized repositories and databasesFederated searching with TreeBASE and KNB LTERTreeBASE submission (OAI-PMH)GenBank (currently in development)

Governance““non-profit status, 12 non-profit status, 12 member Board of Directors”member Board of Directors”

Sets policy, goals•science, journals, societies, OCLC, MS

2006 Dryad development – NESCent +<MRC>•Stakeholders: journals, publishers and scientific societies, and researchers.

2009-2012: Interim Board

$ PAYMENT-Sept. 1,2014

Page 13: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

13 11-20-14/Greenberg

Page 14: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

14 11-20-14/Greenberg

Singapore Framework

Dryad DCAP, ver. 3.0bibo (The Bibliographic Ontology)dcterms (Dublin Core terms)dryad (Dryad) DwC (Darwin Core)

Vision1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric2.Interoperable: harvesting, cross-system searching 3.Semantic Web compatible: sustainable; supporting machine processing

Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/19386380903405090.

Page 15: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

15 11-20-14/Greenberg

Metadata research & developmentMetadata research & development1.Curation workflow - cognitive walkthroughs2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010)3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012)5.Name-authority control - exploratory study (Haven, 2009, INLS 720)6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM)7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) 8.Vocabulary needs (HIVE) (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib)9.Metadata theory – deductive analysis (Greenberg, 2009)

Page 16: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Interoperability slope

Semantic ontologies

Researcher names

Agency/institution

Page 17: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

17 11-20-14/Greenberg

Page 18: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 19: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Package metadata harvested from email

Subj. 177 (gr. 97%, rd. 2%, bl. 1%)

Contr. 101 (gr. 99%, bl. 1%)

Page 20: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

20 11-20-14/Greenberg

The leap - capital to metadata capital

An economic concept (Weber, 1905; Smith’s, 1776) • Business and operations (net gains or losses)• Finances, goods and services, and public needs• Intellectual capital, social capital• a tangible result, value increase

Metadata as an asset, a product • Reuse of good quality metadata increase

value of initial investment• Poor quality may reduce metadata capital ?

• Metadata reuse prevalence • Cooperative cataloging , CIP, ISBD, MARC, FRBR,

LCC, VIAF, OAI-PMH, CrossRef, PubMed, Zotero, BibTex, DataCite. Linked data/Semantic Web, PIDs, etc.

Page 21: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Modified Capital-sigma notation

Reuse

nR + ∑ ai = R + a1 + a2 +a3 + …an

i=1R = value of the metadata recordi= number of usagesa = incremental increase in valuen = maximum number of reuse

Page 22: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

22 11-20-14/Greenberg

Author/Submitter | Curator

100 metadata instantiations•8 of 12 metadata properties had reuse @ 50% or greater•5 of 8 confirmed reuse at• 80% or higher. •Basic bib. vs. complex

Page 23: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Author

Subject

Dcterms.spatial

DwC.ScientificName

Page 24: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Modified Capital-sigma notation for linked data linked data

Reuse of linked data concept/URI

P = Determined by the number of terms in an ontology, labor hours to generate, integrate, etc,

Page 25: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

25

Helping Interdisciplinary Vocabulary Engineering (HIVE)HIVE)

C V cost, interoperability, and usability constraintsC V cost, interoperability, and usability constraints Linked Open Vocabulary initiative, to support inter/transdisciplinary…. SKOS (a little dumb) AMG + machine learning approach for integrating discipline terminologies

Page 26: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 27: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

27 11-20-14/Greenberg

~~~~Amy~~~~Amy

Meet Amy Zanne. She is a botanist.

Like every good scientist, she publishes, and she deposits data in Dryad.

Amy’s dataAmy’s data

Page 28: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

28 11-20-14/Greenberg

Page 29: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

29 11-20-14/Greenberg

Successive growth rates

N∑ ic = Θ (nc +1) i=1

Cycles…

What about successive growth rate tied to a concept? A concept can be

• in ~ vernacular to canonical• fall by the wayside, less popular• out (deprecated)

Page 30: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

30 11-20-14/Greenberg

Conclusion…other Valuation Approaches

Market cap of Facebook per user: $40 – $300 Revenues per record per user: $4 – $7 per year

• Facebook• Experian

Market prices of personal data:

• $0.50 for street address• $2.00 for date of birth• $8 for social security number• $3 for driver’s license number• $35 for military record

SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

Page 31: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Concluding remarks

Interest….traction Limitations: bad data,

cost/value We should care about

cost Metadata capital can

contextualize Generic formula for

further research

Page 32: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

32 11-20-14/Greenberg

Metadata Standards Directory Working Group….

Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela

Page 33: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

33 11-20-14/Greenberg

“…develop a collaborative, open directory of metadata standards applicable to scientific data”Stakeholders: Researchers, data managers, data scientists, tool developers, repositories, agencies, societies (RDA’s growing community)

Goals and workplan - DCC Disciplinary Directory: http://www.dcc.ac.uk/resources/metadata-standards

Page 34: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

34 11-20-14/Greenberg

Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp,

Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)

**Drexel/UNC <Metadata Research Center>: Jose R. Pérez-Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary

U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research

Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and

many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton

Page 35: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

35 11-20-14/Greenberg

http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki

http://code.google.com/p/[email protected]

Facebook: Dryad Twitter: @datadryad

http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/

Metsdata Reserch Center: http://cci.drexel.edu/mrc

http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki

http://code.google.com/p/[email protected]

Facebook: Dryad Twitter: @datadryad

http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/

Metsdata Reserch Center: http://cci.drexel.edu/mrc

Page 36: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

36 11-20-14/Greenberg

Sustainability: Plan Comparison

Payment Plan Member Non-member Minimum purchase

1. Voucher Plan USD$65 per data package

USD$70 per data package 25 vouchers

2. Deferred Payment Plan

USD$70 per data package

USD$75 per data package 1 yr contract

3. Subscription Plan

Annual fee based on USD$25 per published research article

Annual fee based on USD$30 per published research article

2 yr contract

For individuals:Pay on acceptance NA

USD$80 per data package, payable by the submitter

1 data package

Page 37: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

37 11-20-14/Greenberg

More on grown and sustainability Membership: http://datadryad.org/pages/

membershipOverview Pricing and sponsorship of

deposits: http://datadryad.org/pages/pricing

Journal integration:  http://datadryad.org/pages/

journalIntegration