Top Banner
PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library
23

PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Dec 17, 2015

Download

Documents

Edgar Bridges
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

PIPING HOT:Little Bins in

big workflows

Alex GarnettDigital Preservation & Data

CurationSFU Library

Page 2: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.
Page 3: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Thesis: I am a terrible programmer

Page 4: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Thesis: I am a terrible programmer

• 20% of you are thinking “no kidding!”

• The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”

Page 5: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Thesis: I am a terrible programmer

• 20% of you are thinking “no kidding!”

• The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”

• Who needs impostor syndrome when you have a bash shell?

Page 6: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.
Page 7: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.
Page 8: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.
Page 9: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

• For the record, this is the payoff from all those colonoscopy jokes. Yep.

Page 10: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.
Page 11: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

But how does it apply to libraries?

[If MJ Suhonos is here this year, this is his cue to groan

audibly]

Page 12: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

LIBRARY PROBLEM #1: PDFA

• ProQuest wants PDFA submissions from now on

• “now on” apparently = the past five years’ backlog

• We have to convert five years of theses!

• This is now also being used at the UofA.

Page 13: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.
Page 14: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

LIBRARY PROBLEM #2: ARCHIVES PROBLEM:

LIBRARY HARDERSTARRING BRUCE

WILLIS

CRAP, I USED UP THE WHOLE SLIDE ON THE

TITLE

Page 15: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

• Archives needed a GUI tool to be able to create restrictive FTP accounts for donors.

Page 16: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

LIBRARY PROBLEM #3:PDF REDACTION (IT’S LIKE THE FIRST ONE

BECAUSE NO ONE LIKED THE SEQUEL,

DOES ANYONE WANT TO WATCH TEMPLE OF

DOOM LATER, OH HELL I’VE DONE IT AGAIN)

Page 17: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

• We learned we had some poorly redacted PDFs

• Blackout meant to obscure text; still selectable

Page 18: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

• Solution:– Detect offending pages with

ghostscript…• (this is the hard part; dumping PDF guts is

appalling)

Page 19: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

• … and then:– Snip offending pages with pdftk– Convert them to images with imagemagick– OCR back into PDF (minus obscured text)

with tesseract and fix up the dimensions with gs again

– Paste back in with pdftk.– 5 lines, all free tools! Documentation &

piping.

Page 20: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Takeaway

• If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way

Page 21: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Takeaway

• If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way

• There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.

Page 22: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Takeaway

• Open-source command line tools are really good these days! They are powerful, they are straightforward, and they are often cutting edge.

• There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.

Page 23: PIPING HOT: Little Bins in big workflows Alex Garnett Digital Preservation & Data Curation SFU Library.

Surprise: Everybody gets a free colonoscopy after all!

• Thanks! [email protected] ; @axfelix