Top Banner
An Introduction to Document Scanning Business Document Scanning 101: From the Data Capture Prospective
53

An Introduction to Document Scanning, Understanding Your Requirements

Jan 19, 2015

Download

Technology

Learn about the basic decisions required for business document scanning. Indexing, file formats, document resolution, color space, and more. Learn about estimating volumes and automated capture technology such as barcode recogonition, OCR, batch document processing and more.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Introduction to Document Scanning, Understanding Your Requirements

An Introduction to Document Scanning

Business Document Scanning 101: From the Data Capture Prospective

Page 2: An Introduction to Document Scanning, Understanding Your Requirements

So you have a lot of this?

Page 3: An Introduction to Document Scanning, Understanding Your Requirements

And you’ve decided this is the answer.

Page 4: An Introduction to Document Scanning, Understanding Your Requirements

So you need a crash course in scanning

Page 5: An Introduction to Document Scanning, Understanding Your Requirements

Lessons:

Lesson 1: Simplex or Duplex

Lesson 2: Resolution

Lesson 3: Color Depth

Lesson 4: File Formats

Lesson 5: Indexing

Lesson 6: Document Prep and Estimating Volumes

Homework: Learn More About Data Capture and Document Management

Page 6: An Introduction to Document Scanning, Understanding Your Requirements

Lesson 1: Simplex or Duplex

Are the documents single or double-sided? This may

seem obvious but…

Page 7: An Introduction to Document Scanning, Understanding Your Requirements

You many not want documents such as purchase

invoices scanned in duplex where the back of the

document only contains terms and conditions.

On the other hand, if the documents have high legal

importance you may want every conceivable item of

information captured such as small signatures or

notes on the back.

Page 8: An Introduction to Document Scanning, Understanding Your Requirements

Duplex scanning requires

more scanning

time/processing and

results in larger files.

Page 9: An Introduction to Document Scanning, Understanding Your Requirements

And you don’t have to be a genius to know that

is more costly.

Page 10: An Introduction to Document Scanning, Understanding Your Requirements

Lesson 2: Resolution

Page 11: An Introduction to Document Scanning, Understanding Your Requirements

So what is resolution and why does it matter?

Page 12: An Introduction to Document Scanning, Understanding Your Requirements

Resolution is expressed as the number of dots per inch

(dpi) or less frequently pixels. Pixel refers to “picture

element” per inch (ppi) which make up the image or

really at what the image was sampled.

What is Resolution?

Page 13: An Introduction to Document Scanning, Understanding Your Requirements

Implications of Resolution

This graphic contains two

images, a “0” as a grayscale

image and an “x” as black

and white.

Page 14: An Introduction to Document Scanning, Understanding Your Requirements

Implications of Resolution

• If we halved the size of the grid horizontally and vertically

(doubled the resolution), the pixels would appear smoother

and produce a better quality image, the inverse would be true

if we doubled the size of the squares.

• If we kept the squares the same size but reduced the size of

the characters significantly the resolution is insufficient.

Page 15: An Introduction to Document Scanning, Understanding Your Requirements

Implications of Resolution

• The higher the resolution, the better the image quality.

• For small characters, increase the resolution to capture

them effectively

So:

Page 16: An Introduction to Document Scanning, Understanding Your Requirements

And, the higher the resolution, the slower the scan and the larger the file.

Page 17: An Introduction to Document Scanning, Understanding Your Requirements

And, the higher the resolution, the slower the scan and the larger the file.

Which means higher scanning and file storage costs, Einstein.

Page 18: An Introduction to Document Scanning, Understanding Your Requirements

Typical Scanning Resolutions

• Web graphic – 96 dpi

• Standard archive document – 200 dpi

• Document required for optical character recognition (OCR)

– 300 dpi

• Plans/drawings for vectorization – 400 dpi

• Documents required for historical archiving – 600 dpi

Resolution is generally determined by intended use.

Page 19: An Introduction to Document Scanning, Understanding Your Requirements

Lesson 3: Color Depth

Page 20: An Introduction to Document Scanning, Understanding Your Requirements

Documents scanned in black and white are always

scanned as grayscale within the scanner. The scanner

then applies a process known as thresholding to the

image to produce the black and white image.

Thresholding simply determines when a pixel should

be black or white.

Understanding Black and White

Page 21: An Introduction to Document Scanning, Understanding Your Requirements

Grayscale is used when the image contains color or

grayscale data and the tone of the image needs to be

retained, i.e. photographs or shaded graphics.

Understanding Grayscale

Page 22: An Introduction to Document Scanning, Understanding Your Requirements

Color is obviously used when the image contains color

data. Some users wish to retain important color

information for example, land boundaries or graphical

data, and not letterhead logos, highlighters, etc.

Understanding Color

Page 23: An Introduction to Document Scanning, Understanding Your Requirements

Bits per pixel

File Storage Requirements

24 8 1

Page 24: An Introduction to Document Scanning, Understanding Your Requirements

Bits per pixel

File Storage Requirements

24 8 1

So the storage requirements for a grayscale image is 8

times larger than a black and white, and color

requirements are 24 times more than black and white.

And, remember Einstein, larger files equals higher costs.

Page 25: An Introduction to Document Scanning, Understanding Your Requirements

Lesson 4: File Formats

TIFF

JPEG

PDF

For an in-depth look visit: PDF v. TIFF

Page 26: An Introduction to Document Scanning, Understanding Your Requirements

• Well established format

• Most often used for black and white documents

• Supports multiple pages

• Interpreted correctly by most applications with a caution

on certain color implementations

• “Group 4” format refers to the compression method used

on black and white images which is a “lossless”

compression where original data is not lost in

compression/decompression.

Understanding TIFF* TIFF

*Tagged Image File Format

Page 27: An Introduction to Document Scanning, Understanding Your Requirements

• Well established format by Adobe

• Supports color, grayscale, and black and white

• Supports multiple pages

• Generally stored using Group 4 and JPEG compression

although supports other formats too.

• Used when more advanced features are needed within the

file such as embedded Optical Character Recognition

(OCR), hyperlinking, digital signing and other security

features.

Understanding PDF* PDF

*Portable Document Format

Page 28: An Introduction to Document Scanning, Understanding Your Requirements

Searchable PDF:

Understanding PDF Variations PDF

Many scanning applications can create searchable PDF files.

Here, the scanner applies OCR technology to make the file

text searchable. Your application may label this as “make

searchable”, “apply OCR”, “text-under-image” or

“searchable PDF.” If selected, your file will be text

searchable or text selectable within the Acrobat viewer and

many other programs that search PDF files

Page 29: An Introduction to Document Scanning, Understanding Your Requirements

PDF/A:

Understanding PDF Variations PDF

PDF/A is an ISO-standard for digital preservation or

archiving of electronic documents.

It differs from standard PDF by omitting features not

necessary for long-term archiving, such as font linking.

Growing in international government and industry

segments, including legal systems, libraries, newspapers, and

regulated industries.

Page 30: An Introduction to Document Scanning, Understanding Your Requirements

Understanding JPEG JPEG

*Joint Photographic Expert Group

• Well established format

• Most often used for photographs and graphics

• Supports single page only

• A “lossy” compression format, that is, some of the data is

lost during compression. however it provides good

compression ratios for grayscale and color images.

Page 31: An Introduction to Document Scanning, Understanding Your Requirements

Compression and File Size

*Comparison courtesy of Wikipedia

OMG, right?

JPEG

Page 32: An Introduction to Document Scanning, Understanding Your Requirements

Compression and File Size

*Comparison courtesy of Wikipedia

OMG, right?

The bottom line: experiment with your images and file size. A middle qualit y scan may meet your needs and save

tremendous file space.

Page 33: An Introduction to Document Scanning, Understanding Your Requirements

Lesson 5: Indexing

For an in-depth look visit: What is Document Indexing?

Page 34: An Introduction to Document Scanning, Understanding Your Requirements

What is Indexing?

Document indexing (sometimes referred to as metadata)

enables a users to quickly and efficiently locate their

documents, either through a folder structure, database or

electronic document management system.

Page 35: An Introduction to Document Scanning, Understanding Your Requirements

Avoid a disaster

Page 36: An Introduction to Document Scanning, Understanding Your Requirements

Avoid a disaster

Great care should be taken to design an efficient indexing

scheme. If the design is not devised correctly at the outset, trying

to rectify it later can be both difficult and costly.

Sometimes it makes sense to replicate the current manual method

for document location to create a familiar, but faster system.

Page 37: An Introduction to Document Scanning, Understanding Your Requirements

Don’t worr y, there is automation

Technologies such as

• Barcode recognition

• OCR

• Batch processing

• Data Mining, Text Mining

can save time and money by automating indexing and more.

Page 38: An Introduction to Document Scanning, Understanding Your Requirements

Using Barcodes for Indexing

Intelligent data capture

software can extract

data from barcodes to

create and send index

information to a

document management

system.

For an in-depth look at barcodex in data capture visit: What Can Barcodes Do For Me?

Page 39: An Introduction to Document Scanning, Understanding Your Requirements

With OCR, make your image-based file fully text searchable or extract data from a zone for indexing.

Page 40: An Introduction to Document Scanning, Understanding Your Requirements

Using OCR for Indexing

With zonal OCR, document areas

are identified for automatic OCR

capture. Additionally, drag-and-

drop OCR allows an operator to

highlight document text which is

automatically OCR'd and dropped

into index fields.

Page 41: An Introduction to Document Scanning, Understanding Your Requirements

TIPS for OCR

• Scan at 300 dpi for greater accuracy and ensure

that small text is captured.

• Limit the use of color on documents.

• Pre-process the image with image enhancement

software (available in many data capture

products, learn more).

Page 42: An Introduction to Document Scanning, Understanding Your Requirements

Intelligent data capture solutions often use batch processing that lets you process

a whole folder of documents at a time. Some products can “watch folders,” and

process files as they are scanned into the folder.

What is Batch Processing?

For an in-depth look visit: What is Batch Document Processing?

Page 43: An Introduction to Document Scanning, Understanding Your Requirements

Intelligent data capture solutions often use batch processing that lets you process

a whole folder of documents at a time. Some products can “watch folders,” and

process files as they are scanned into the folder.

What is Batch Processing?

Processing can include indexing, file routing, file splitting, and cleaning/enhancing the scans. Learn more.

Page 44: An Introduction to Document Scanning, Understanding Your Requirements

Lesson 6: Document Prep and

Estimating Volumes

Page 45: An Introduction to Document Scanning, Understanding Your Requirements

Preparation, qualit y control and indexing are the most time consuming elements of any scanning job and usually the most costly.

Page 46: An Introduction to Document Scanning, Understanding Your Requirements

TIPS for OCR

Typically a good operator can prepare 750-1000 documents per hour, however a number of factors may drop throughput to 300 or 500.

Page 47: An Introduction to Document Scanning, Understanding Your Requirements

Odd Size Document Type sales receipts, photos, plans/drawings,

Bindings three ring, spiral, glue, folder

Fasteners staples, paper clips binder clips, rubber bands

Attachments Post-its, tabs

Factors that Influence Document Prep

Page 48: An Introduction to Document Scanning, Understanding Your Requirements

Estimating Volumes and Storage

Type

Paper

Folders Ring Binder

Lever arch

folder

Transfer

Cases

Bankers

Boxes Archive Boxes

Filing

Cabinets

Simplex

(avg #s)

30 to 100 200 500 500 500 2500 3000/drawer

Duplex

(avg #s)

60 to 200 400 1000 1000 1000 5000 6000/drawer

Learn more about estimating volumes

Page 49: An Introduction to Document Scanning, Understanding Your Requirements

Homework: Learn More About Data Capture and Document

Management

More

Page 50: An Introduction to Document Scanning, Understanding Your Requirements

Document Management

Determine if you require a full document

management system or do you just need a simple

search and retrieval system?

Can I use it as a stepping stone while I evaluate

my document management system?

Page 52: An Introduction to Document Scanning, Understanding Your Requirements

Call us for information on: How to digitize medical or dental records. The best way to scan medical or dental records. Scanning paper records. Document scanning for medical or dental records. Going paperless at the medical or dental office. How to capture medical or dental records efficiently. Scanning medical or dental records with Fujitsu ScanSnap. Touchscreen scanning of medical or dental records. How to improve your medical or dental workflow with document scanning. Scanning to EMR or scanning to EDR How to maximize your Fujitsu ScanSnap Using your ScanSnap for a basic document management system Using barcodes and the Fujitsu ScanSnap Scanning with the Fujitsu ScanSnap Automating workflow with the Fujitsu ScanSnap Automating document management capture Scanning into Dentrix Indexing into Dentrix Understanding basic Document Scanning

Things your teacher never told you about Document Scanning An introduction to Document Scanning Scanning Fundamentals for the average Joe

By DocuFi

Makers of ImageRamp Data Capture Solutions

30 years’ Experience in the Document Imaging Market

Proven Fujitsu ISV Partner

Find out more at ImageRamp and

www.docufi.com

Page 53: An Introduction to Document Scanning, Understanding Your Requirements

Image Credits

• Pjohnkeane, Requirements, requirements, requirements, http://bit.ly/1fcULDf • Doug Waldron, “Files (85)”, http://bit.ly/1bfciII • UBC Learning Commons, “Scanner_icon-1024x671”, http://bit.ly/1eewI4P • Knile Lucy, you have some sor ting to do! http://bit.ly/19bSgjF • Michael 1952, SJSA Fifth Grade - I Fell in Love With The Teacher, http://bit.ly/1eevu9A • Ton Haex, ”Einstein show.... “, http://bit.ly/LVqeBi • Loco Steve, “Sunrise under scrutiny”, http://bit.ly/1eevSVv • Tax Credits, “ Coins”, http://bit.ly/1mtQj5j • j_baer, ”Ubuntu Color Wheel”, http://bit.ly/1jARikx • Marcin Wichar y, Alphabetical, http://bit.ly/1aILOku • David Erickson e-strategyblog.com, “Hindenburg Disaster”, http://bit.ly/1jASeFF • William Warby w warby,” Gears”, http://bit.ly/1dwtU1S • Alan Cleaver,” watching”, http://bit.ly/1h1k9k7 • Zoetnet, “overflowing,” http://bit.ly/KHW9Em • Seattle Municipal Archives, “Comptroller's Office employees, 1960”, http://bit.ly/1eBvLGE • Seattle Municipal Archives , “Cit y Light worker with office machine, 1954”,

http://bit.ly/1eBw3NM • Patrick Hoesly, “Thank you” http://bit.ly/17xKErE

All images are owned or licensed by DocuFi with acknowledgement given to: