10 Things you didn’t know about KTM
[email protected] Solution Enablement Specialist
What is KTM?
2
What is KTM?
Kofax’ Answer to
Document Drudgery
3
What is KTM?
Kofax’ Intelligent Document
Recognition Solution/Toolkit/Platform
4
The Golden Rule of KTM
6
User productivity?
Automation?
Benefits of „User Productivity“
7
Wholesaler opens its 17th store
Pan European Wholesaler Improvement invoices/person
/day Productivity Improvement
Manual processing without Kofax 800 After 3 months of “Accuracy“ effort by PS 1200 +50%
After 2 weeks of “User Productivity” effort by PS 2500 > 3 x
The Fallacy of OCR Accuracy
What OCR accuracy do you have?
What is the straight-through processing rate?
How much can we automate?
85% straight-through processing 23 fields → 99.29% field accuracy
6 chars/field → 99.89% character accuracy
What is the cost of the other 15%?
You will lose this deal against an OCR Provider because this deal is being fought over features and tech, and not business value
8
Productivity vs Automation
Productivity Documents/person/day User focused Business value Optimizing core-
business processes Usability/comfort
8hrs/day Saving $€ Limit = ∞
Automation Accuracy Numbers technology focused Impossible to convert
to ROI Technology Diminishing returns Limit = 100%
9
Anyone can do KTM
Classify Separate Extract Folder
Validate Learning
10
All you need is paper and highlighters
Classify Separate Extract Folder
Validate 11
“Doing” KTM by hand. Paper to Excel.
12
Classic vs Quantum
13
Newton Einstein
Schrödinger
Bohr
God doesn‘t play dice. Spooky Action at a Distance
14
Programmable vs Learning Software
Deterministic, logic, rules,
Laws, order
Probabilistic, data-driven,
Machine learning
15
Analytics
Transition from Determinism to Data-driven/Fuzzy/Quantum
Physics 1890 – 1920 (Classical to Quantum)
Mathematics 1931 (a system cannot demonstrate its own consistency, Kurt Gödel‘s incompleteness axiom)
Computer Science 1970 – 1990 (machine learning, neural networks, speech recognition, machine translation)
Business 2000 – 2020 (Big data, analytics, learning systems)
16
EU’s Human Brain Project & USA’s BRAIN Initiative in 2013
10 billion€ from the EU over 10 years to build a human brain simulator to push forward brain research and test brain diseases.
100M$ from US government to revolutionize our understanding of the human mind and uncover new ways to treat, prevent, and cure brain disorders
“You don’t program it, it learns”
17
Don’t program KTM, teach it
18
Robodog will bite you
19
“Doing” KTM by hand. Paper to Excel.
20
Field Analysis
File Class Capa Nro_DOC NOTA CPFC EMIS VENC EMIT
1.tif CapaLote 123987-2012
2.tif Duplicata 123987-2012 852147-A 60.000,00 07.248.659/0001-03 15/02/2013 15/02/2013 Y
3.tif Duplicata 123987-2012 1489/1 15.963,57 17.155.342/0003-45 01/12/2011 21/01/2012 Y
4.tif Duplicata 123987-2012 3112230U 2.195,30 86.438.280/0001-30 22/12/2011 19/01/2012 Y
5.tif Duplicata 123987-2012 4012391 81.045,00 10.932.276/0001-61 14/12/2011 23/01/2012 Y
6.tif Duplicata 123987-2012 3065357 F 1009,11 80.089.964/0001-97 27/10/2011 21/09/2012 Y
7.tif Nota Fiscal 123987-2012 65357 7.981,39 80.089.964/0001-97
8.tif Nota Fiscal 123987-2012 194.580 48.741,92 76.777.556/0001-50
9.tif Nota Fiscal 123987-2012 000.022.875 32.650,74 56.990.526/0001-10
10.tif Nota Fiscal 123987-2012 112230 2.195,30 86.438.280/0001-30
11.tif Nota Fiscal 123987-2012 194.562 7.454,92 03.364.370/0001-46
21
Overview of fields to extract – What a customer typically gives field format number of
characters
Document type Validation with Loss Payment Rates Budget Invoice multi-
invoice
Reference number numeric 6-7
x New ones have no ref-nr!
x x
Possilby more than one per
doc
x
x Multiple
Ref-Nr per Document
CIP database
Debtor Last Name Text unlimited x x x x
only for validation, if existing
only for validation, if existing
CIP database in combination with Ref-Nr
Debtor First Name Text unlimited x x x x
only for validation, if existing
only for validation, if existing
CIP database in combination with Ref-Nr
Debtor Street Text unlimited x x x x CIP database in combination with Ref-Nr
Debtor House number numeric unlimited x x x x
CIP database in combination with Ref-Nr
Debtor Address2 Text unlimited x x x x
CIP database in combination with Ref-Nr
Debtor PostCode numeric 4 Swiss only Swiss only x Swiss only
only for validation, if existing
only for validation, if existing
CIP database in combination with Ref-Nr
Debtor City Text unlimited x x x x only for
validation, if existing
only for validation, if existing
CIP database in combination with Ref-Nr
Debtor Telephone numeric x
CIP database in combination with Ref-Nr. If there is no other number in database, then manual validation.
+42 more rows
22
The most successful KTM projects focus on the user.
Make your users happy and content.
KTM is their workplace all day every day.
It is the place of encounter and collaboration between human and robot.
24
25
Human – Computer Interaction
Validation Experience
Result Type Correct Valid User Experience
True Positive Perfect! Touchless processing.
Automation.
False Negative User must press ENTER.
True Negative User Corrects/Types data.
False Positive Loss of trust. Drop of productivity.
Bad data leaves Kofax.
26
KTM Customer Query
This [deal] was sold on the strength of KTM being able to classify and extract data from items received…. This was then used to calculate the ‘RETURN on INVESTMENT’(ROI) which enabled them to purchase the solution. The ROI was calculated with the reasonable estimate of 65% automated processing. I would expect that we should realistically see 80% to 90% automated processing of inbound items. That said, someone communicated to the client that the best they were going to see was 15% to 20% automated processing. This obviously sent the client reeling that they weren’t going to see anything close to their expected ROI and would potentially damage their business and not see the benefits from the system as expected.
So what is a reasonable
expectation of KTM?
28
KTM should be able to significantly improve user productivity (perhaps 1.5-10x)
KTM will be able to extract perfectly information from readable and known documents.
KTM should be able to learn how to understand readable & unknown documents.
KTM’s value is in improving documents/person/day
Transactions/second (TPS) You will have access to near realtime performance graphs that
can optimize user experience and data throughput.
29
Reasonable Customer Expectations
Benchmark Before
30
Benchmark During
31
Benchmark After
32
US invoices – known vendors
33
Goals of every KTM Project
1. Human Productivity 2. Eliminate False Positives
bad data leaving Kofax 3. Reduce False Negatives
user pressing ENTER 4. Few True Negatives
OCR Accuracy, Database problems & learning
34
Fuzziness is your friend
35
Kofax brings messy data from the real world into the clear digital world
Fuzziness
Fuzziness is not Random
Unpredictable
Unreliable
Complex
Fuzziness is Simple
Learning
Flexible
Tolerant
Fuzzy Software you love Google Autocomplete
Spell checkers
Grammar checkers
Spam filter
“Users who read this book…”
36
Top Names US Census 2005
37
Top US Names 2005
1 2 3 4 5 6 7 8 9 10
Male Female Surname 1 JAMES MARY SMITH 2 JOHN PATRICIA JOHNSON 3 ROBERT LINDA WILLIAMS 4 MICHAEL BARBARA JONES 5 WILLIAM ELIZABETH BROWN 6 DAVID JENNIFER DAVIS 7 RICHARD MARIA MILLER 8 CHARLES SUSAN WILSON 9 JOSEPH MARGARET MOORE 10 THOMAS DOROTHY TAYLOR
Vorname 1 Peter 0.80% 2 Daniel 0.80% 3 Hans 0.67% 4 Christian 0.60% 5 Thomas 0.53% 6 Walter 0.52% 7 Michel 0.49% 8 Martin 0.46% 9 René 0.45%
10 Markus 0.45% 11 Josef 0.44% 12 Patrick 0.43%
38
Swiss Forenames
13 André 0.42% 14 Bruno 0.41% 15 Philippe 0.40% 16 Maria 0.40% 17 Andreas 0.40% 18 Roland 0.39% 19 Paul 0.39% 20 Marcel 0.39% 21 Werner 0.37% 22 Antonio 0.36% 23 Pierre 0.35% 24 Urs 0.34% 25 Elisabeth 0.34%
39
Uses
40
37,691,912 citizens
KTM is the heart of Kofax.
Touchless Processing
41
KTM Search&Match
Server
Search & Match Server
42
42
SQL Database Database
Center Firewall
CSV File
Database Fuzzy Index
PDF vs TIFF PDF/A is a Standard ISO 19005-1:2005, ISO 19005-2:2011 and future
safe
Thousands of Incompatible File Formats
Baseline TIFF Readers don‘t have to be able to read Group 4.
Any computer can read a pdf, and Chrome can „natively“.
Tiff viewers need to be installed.
PDF has layers, TIFF does not.
Searchable
PDF compresses better
TIFFs can be manipulate
PDFs have certificates, encryption, DRM, etc..
PDF High Compression – Should be in every project
602 kb
76 kb (87%)
553 kb
B&W PDF
117 kb
47 kb
114 kb
Atalasoft for PDF generation