Data Publishing in Archaeozoology or “Everybody knows that a 14 is a Sheep” Sarah Whitcher Kansa Alexandria Archive Institute OpenContext.org Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licens es/by/3.0/>
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Publishing in Archaeozoology
or “Everybody knows that a 14 is a Sheep”
Data Publishing in Archaeozoology
or “Everybody knows that a 14 is a Sheep”
Sarah Whitcher KansaAlexandria Archive Institute
OpenContext.org
Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License
<http://creativecommons.org/licenses/by/3.0/>
Main PointsMain Points
- Reproducibility and new research opportunities require data sharing
- Raw data are not sufficient- Publishing open data on the
Web is a solution- Publishing data takes special
expertise
Good scientific practice requires data sharing.
We cannot trust results based on hidden data.
Good scientific practice requires data sharing.
We cannot trust results based on hidden data.
• Limits of print (entrenched practice but not best practice)
• Data preservation crisis (wasted effort)
• Hard to compare and integrate data now
The ChallengesThe Challenges
Policy Consensus:
Urgent Need for Better Data Practices!
Policy Consensus:
Urgent Need for Better Data Practices!
DIPIR (http://www.dipir.org)
3-Year project, Oct. 2010-Sept. 2013 National Leadership Grant from the Institute for
Museum and Library Services (LG-06-10-0140-10) Ixchel Faniel (PI), Elizabeth Yakel (Co-PI)
Raw Data Can Be UnappetizingRaw Data Can Be Unappetizing
Data Documentation PracticesData Documentation Practices “I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, ‘This is ridiculous.’… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)
A long way to go before we get usable, intelligible data
Sometimes data is better served cooked.
Adapt “publishing” metaphor to digital data
• Cleaned, described, organized• More intelligible and cohesive• Open access• Linked to other resources (including print
publications)• Machine-readable for discovery and reuse• Archived and curated (CDL)
What is Data Publication?What is Data Publication?
Putting editorially-vetted data on the WebPutting editorially-vetted data on the Web
• Enhanced presentation• Enhanced search, discovery, understanding• Depth & breadth (linked to project data, other datasets,
print publications, etc.)• Allowing for Linked Open Data = facilitates future use• Professional advancement
• Takes time, effort• Requires informatics expertise
Benefits need to outweigh challenges
The Bad:
The Good:
Benefits & ChallengesBenefits & Challenges
Thousand FlowersThousand Flowers
Started in 2007 Integrates and publishes
various forms of archaeological documentation (structured data, media, documents)
Not a repository, but archived with California Digital Library
Interoperability via web services, increasing emphasis on Linked Data
Data Publishing
Data Quality and Standards Alignment(1) Check consistency(2) Edit functions(3) Align to common standards
(“Linked Data” if applicable)(4) Issue tracking, version
EOL (2012) funding for publishing additional zooarchaeology datasets (Neolithic Anatolia), in project led by Ben Arbuckle (Baylor University)
NEH (2012) funding for publishing trade + exchange related datasets (Bronze-Iron Age Mediterranean)
Data Publishing ProjectsData Publishing Projects
Data Publishing ProjectsData Publishing Projects
Complement Conventional Publishing
Lockwood Press (“Archaeobiology Series”), Cotsen Institute Press (UCLA)
Data Publishing ProjectsData Publishing Projects
Driven by research interests and publication goals among researchers wanting to compare datasets, create reference collections, and have citable, full datasets linked to synthetic publications.
Summary
Outcomes of Publishing Data:
(1) Make “datasets” first class citizens in world of scholarly communications
(2) Provide needed transparency to published interpretations
(3) Enable new kinds of multi-disciplinary research across many datasets
Thank you!Thank you!
Special Thanks!
Canan Ҫakırlar, RCAC, Koҫ University, ICAZ, and other sponsors