Top Banner
Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 1
15

Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Jan 02, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 1

Documenting Research Project Process for Reproducibility

Larry HoyleInstitute for Policy & Social Research

University of Kansas

10/22/2012

Page 2: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 2

The challenges

• Large (or complex) multi-disciplinary projects– Multiple sites, data streams, standards, and

practices– Complex data preparation procedures

• Point and click software used• Documenting as overhead

10/22/2012

Page 3: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 3

Example Project

• Farmer's land use decisions related to climate change (e.g. biofuel related crops)

• One component of larger NSF grant • Multiple teams, multiple universities – The two main sites are 135 km apart

• Multi-disciplinary– Economists, geographers, agronomists, biologists, engineers,

climate scientists, anthropologist, sociologist, political scientists, urban planner, GIS experts, photographer

10/22/2012

Page 4: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 4

Example Project Data– Develop substantial geodatabase (ARC SDE)• ground cover, soils, crop statistics, facility locations (e.g.

purchaser, processing plant). Weather, climate, watershed and aquifer models, • Sub-(farmer’s) field geographic level

– Climate models at different scales– Focus groups and multi wave survey (geocoded)– Interviews coded in NVIVO (geocoded)– Photographs– Large proprietary dataset with time-limited use

10/22/2012

Challenge - put it all together and document how it was done and how everything relates.

Other example: Iassist posting

Page 5: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 5

Spatial Aspects

• Reconciling different spatial schemes at multiple scales across time– Raster images, – model grids at different scales, – weather point sources, other point locations (e.g. biorefineries), – political entity polygons (state, county), – farm field and sub-field polygons, – Attribute data at all these levels, imputed and aggregated data

• Harmonizing data from different geographic schemes• Producing new spatial objects

– E.G. corners as separate from circle with center-pivot irrigation

10/22/2012

Page 6: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 6

New Polygons

10/22/2012

Polygons to be extracted from remote sensing imagery

Subfield areas sometimes growdifferent crops(corners are 21% of the square)

Page 7: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 7

Need to Capture Process Example 1

• Project member with expertise volunteered to process data to produce a spatial dataset (soils data).

• Users of the dataset discover anomalies• Expert no longer available, can’t remember

quite what he did and has no documentation (used point and click tools)

• Ouch

10/22/2012

Page 8: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 8

Process Example 2

• Qualitative analysis– Transcription– Multiple coders, common coding scheme– Coding scheme evolves (capture this?)– Training– Paired coders code each interview– Testing of coder reliability

• Integrate this after the fact with geodatabase

10/22/2012

Page 9: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 9

Point and Click

• Some tools are only point and click and don’t create a log.– E.g. Some procedures in ArcGIS

• How do you document process– Screen capture pasted into Word?– Action recording software– Discoverable? Machine actionable?

10/22/2012

Page 10: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 10

An ArcGIS process (different project)

10/22/2012

NSFCHEMAnnualDataProcedure.docx

AnnualLinksByTime4.avi

Page 11: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 11

Need Tools

• There is a need for tools built on top of standards that make it easy to capture and annotate process

10/22/2012

Page 12: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 12

Need Tools to Capture ProcessOne example – SAS Enterprise Guide

10/22/2012

• Can modify nodes during development. • Can run the process from any point

• But – overall process may involve multiple tools - in this case also R and ArcGIS. In other cases, multiple people in different settings.

Scott Long - The Workflow of Data Analysis Using Statahttp://www.indiana.edu/~jslsoc/web_workflow/wf_home.htm

Datasets – Permanent and temporary

Page 13: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 13

Capturing Process as it is Being Developed• False starts and blind alleys– Does the whole process matter or only a process that

reproduces the final result? (learn from my mistakes?)– Description of process gets edited as it evolves

• Adding minimal overhead– If the tool requires a lot of attention it won’t get used.

• Combining sub-processes• Filling in pieces of overall planned project• Parallel parts• Time as ordinal or interval (or ratio?)

10/22/2012

Page 14: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 14

• Annotated screen capture – works on top of any software– Text (or audio/video?) annotation– Dealing with IP in captured images– Flow diagram with popups?– Editable– Time stamped

Tools – The Fantasy

10/22/2012

Sub process edited separately

Planned overall process

Persistent identifiers allow (re-)linking

Page 15: Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Dagstuhl Presentationn 2012 - Larry Hoyle 15

Final thoughts

• Metadata for the audience– Documentation for reproducibility– Documentation in cases of disputed results

• Sometimes the researcher is the audience– One researcher commented that having

documentation at this level would be very helpful in writing methods sections of papers.

– Teaching tool - critique students process– Assists refining methods– Also useful in future similar projects

10/22/2012