Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 1/28 Scanning woes and war stories ELAG 2014 Toke Eskildsen IT nerd (boss says “System Architect”)
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 1/28
Scanning woes and
war stories
ELAG 2014
Toke Eskildsen IT nerd (boss says “System Architect”)
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 2/28
State and University Library
Denmark
“Everything onlinein 2020”
- Vision
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 3/28
We would like plentiful, raw, visible, solid pixels
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 4/28
Zoom
Not like this! Like this!
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 5/28
Histogram
Reference
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 6/28
Adjust Color Levels
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 7/28
That's a nice scan!
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 8/28
A shame it was sharpened
Haloes around text indicates sharpening
Reference
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 9/28
But this one seems fine?
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 10/28
Sharpened and JPEG compressed
Square areas and localized noise indicates JPEG compression
Reference
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 11/28
Lossless!We Promise!
Lossless workflow
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 12/28
A chain is only as strong...
Lossless!We Promise!
JPEG
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 13/28
This one? Please?
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 14/28
JPEG 2000 compression
JPEG 2000 lossy compression signs are best learned from multiple examples
Reference
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 15/28
Burnout
Sharp spikes at either end of the histogram indicates burnout
Reference
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 16/28
Burnout - visualization
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 17/28
But we need the dark to read!
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 18/28
No you don't!
Visualisation of ALTO-OCR files: https://github.com/tokee/quack
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 19/28
¦ “. i N ¦ M i; sk s, 011 el en al vei dens -t oi - te kom ei ner milen t'i bi -i i. 1 1 id iiiip .il -ukkersvg, I i ; \\ . : i .i i mod! , 1 1 km ikui i ene i !i;i • 1. t.v U.ilisk kali! Dit ei l u 't eksi isk.a bet /.> . i : 1 1 'I I l'b.! : man 'lit le.l b der ledes at Lundbecks tulbpeie forsk mi I I i 1 ' kt ' '1 F.va Sti 11 iess m i m ! o! i i . ! i , v! I ; vende l.epemiddel • ¦ ! a a 1. 1 >!:' a ! t \\ p. ¦ 2 tliahvtcs ; n 'i i te '. oi st, ¦ Klm.sKe i ol sop I n slik kel s vpe pat len! er Di amerikanske 1 1 1 1 1 1 1 i , F! >A il pi ve";
ABBYY FineReader 10.5
Some software upgrades matter!Novo Nordisk, som er en af verdens største koncerner inden for behandling af sukkersyge, bliver nu mødt af konkurrence fra en ny dansk kant. Det er biotekselskabet Zea - land Pharmaceuticals, der ledes af Lundbecks tidligere forsk - ningsdirektør Eva Steiness, som fører et nyt lovende lægemiddel til behandling af type 2-diabetes frem til de første kliniske forsøg på sukkersyge-patienter. De amerikanske sundheds - myndigheder, FDA, har givet
ABBYY FineReader 11
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 20/28
Post-processing
Holes in the histogram indicates leveling / exposure / contrast
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 21/28
Beware: Post-processing + JPEG
Post-processing indicators becomes less distinct when the image is JPEG compressed
Reference
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 22/28
Notice the lines?
300DPI @ 20x enlargement 300DPI @ 19x enlargement
Eve
ry o
the
r lin
e
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 23/28
Time for scanner calibrationG E N E R A L C A M E R A S E T T I N G S:
Camera Model No.: P2-20-08K40Camera Serial No.: 11041074Camera Network ID: 0Network Message Mode: disabled
Firmware Design Rev.: 03-081-20017-01 Aug 29 2007DSP Design Rev.: 03-056-20013-00
SETTINGS FOR UNCALIBRATED MODE:
Analog Gain (dB): +0.0 +0.0Analog Offset: 634 630
SETTINGS FOR CALIBRATED MODE:
Analog Gain (dB): -0.4 -0.5Analog Offset: 624 630Digital Offset: 0 0Calibration Status: FPN [uncalibrated] PRNU [calibrated]
SETTINGS COMMON TO CALIBRATED AND UNCALIBRATED MODES:
System Gain: 0 0Background Subtract: 0 0
Pretrigger: 0Number Of Line Samples: 32Video Mode: calibratedData Mode: 0Exposure Mode: 4
SYNC Frequency: external (9398.09) HzExposure Time: external
End-Of-Line Sequence: onUpper Threshold: 240Lower Threshold: 15Region Of Interest: 0001 - 8192
OK>
Systematic alternating lines indicates that the scanner should be calibrated
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 24/28
Last one is tricky
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 26/28
New tool – Grid lines
Toke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 27/28
Upscaling
I'm working on a small tool for detecting scaling. Very alpha: https://github.com/tokee/telltale
BibTekConf 2013 - Lucene/Solr samsøgning og skaleringToke Eskildsen [email protected] ELAG 2014 – Scanning woes and war stories - 28/28
Are your scans just fine?
Toke Eskildsen, Statsbibliotekethttp://en.statsbiblioteket.dk/newsdigi
http://[email protected]
@TokeEskildsen