Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmenta Towards e-Library of Telugu literature Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha Nath Co-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay October 31, 2011 Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Compute Towards e-Library of Telugu literature
45
Embed
Towards e-Library of Telugu literaturerajarja/MTP/eLibrary.pdf · Towards e-Library of Telugu literature.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Towards e-Library of Telugu literature
Rajesh Babu Arja(Roll no: 09305914)
Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri
Department of Computer Science and EngineeringIndian Institute of Technology Bombay
October 31, 2011
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
1 Introduction
2 Work done in stage1
3 Binarization and Skew detection
4 Segmentation and Noise removal
5 Line Segmentation
6 Classification
7 Post Processing
8 Conclusion and Future Work
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
What we want to do?
What we want to do?
Digitalize Telugu literature
Why we want to do?
For easy cataloging of documents
To enable search on huge literature
To preserve the literature of the Telugu
To encourage transliteration of Telugu documents to otherlanguages
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
How can we do?
How can we do?
By Implementing OCR for Telugu
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
What is OCR?
Figure: Optical Character Recognition(OCR)
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Applications
Library cataloging
Base for many NLP applications.
Applications in banks and post offices.
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Our Scope
Our scope
Scanned Telugu text documents
Documents with printed Text format
Documents without any images
Challenges involved
Resolution of document is not in our control
Noise can be present in the scanned document
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Why Telugu OCR is difficult?
Telugu vs English
English Telugu
No. of classes 52 more than 400No. of connected components per character 1 1-5Dependency among connected components No Yes
Confusion Characters No Yes
Table: Comparison of Telugu and English languages
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Confusion characters
Figure: Confusion Characters example
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Why Telugu OCR is difficult?
Telugu vs Other Indian Languages
Devanagari Telugu
Word Segmentation easy difficultConfusion Characters less more
Table: Comparison of Telugu and Devanagari languages
Tamil Telugu
Vowel modifiers connected disconnected
Table: Comparison of Telugu and Tamil languages
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Comparison of Telugu OCR
S.No Features Classification Font Ind Noise free Open Source
[1] Primitive Decision trees No No n/a[2] Circular Template match Yes No n/a[3] Wavelet neural networks No No n/a[4] Fringe map Template matching No No Yes[5] Gradient dir Nearest neighbor No No n/a6 ? ? Yes Yes Yes
Table: Comparison of Telugu OCR literature
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Comparison with state of art Telugu OCR
S.No Input Output Font Ind Noise free Open Source
1 PGM ACI No No Yes2 any image format txt/pdf Yes Yes Yes
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Our Goal
Our goal is to implement OCR with following features
Works as font independent system
Works on noisy data
Capable of digitalizing documents in DLI.
Open Source project
Web interface for OCR
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Work done in stage1
Implemented end to end system
Explored various stages of OCR
Implemented each stage with basic methodologies
Observed the problems involved in each stage
Identified potential improvements to do in each stage
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Stages of OCR
Stages of OCR
Binarization
Skew detection and Correction
Noise removal
Segmentation of lines, words and characters
Feature extraction and classification
Rendering data
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
What is Binarization?
Figure: Binarization example
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Binarization
Our Methodology:
Using global thresholding
Using Java Advanced Imaging API [1]
Results:
Obtained more than 90% accuracy.
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
What is Skew Detection?
Figure: Skew Correction example
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Skew Detection and Correction
Our Methodology:
Using Hough’s transformation
Using Java Deskew Implementation [2]
Results:
Correcting skew angle of ±20.
Limitations
Don’t work for multi-skewed document
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
What Segmentation?
Line segmentation example Connected component segmentation
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Why Segmentation?
Advantages of Segmentation
To make processing easier in next stages
To minimize number of classes
Different types of segmentation
Line segmentation
Word segmentation
Connected component segmentation
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Connected component segmentation
Our Methodology
Found 8-way connected components
Implemented similar algorithm as flood fill [3]
Figure: Connected components example
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Why Line Segmentation?
Characters are not labeled sequentially
To bundle the consonant modifiers with correspondingcharacters
To process document in order
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Our methodology
Our Methodology
We found each line consists of four states
We found histogram of density of black pixels in each line
Used Hidden Markov Model(HMM) for segmentation of lines
Basing on state changes found out line boundaries
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Example
Before Line segmentation After Line segmentation
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Example
Before Line segmentation [5] After Line segmentation
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Observations and Future Work
Observations
Works well with uniform spacing documents
Non-Uniform spacing documents are not segmented properly
Planned Improvements:
Planning to apply Conditional random fields(CRF) for linesegmentation.
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Feature Vector generation and classification
Our Methodology
Using 0/1 feature vector
Generated synthetic data for three Telugu fonts
Using euclidean distance for classification
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Feature Vector generation and classification
Observations
Dividing the characters into groups will improve accuracy.
Experiments
Found accuracy without classifying connected componentsinto groups
Found accuracy with classifying connected components intogroups
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Feature Vector generation and classification
Results
Method Without grouping With grouping
0/1 33.75% 41.58%SIFT 10.40% 15.97%
Table: Classification Accuracy statistics
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Post Processing
Our Methodology
Property file created which maps each character lable tocorresponding Unicode
Unicode corresponding to classified character lable is picked
Identified the character position and additional care takenaccordingly
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Example
Figure: Example for rendered output text
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Observations
Observations
Word spacing is not completely perfect.
Paragraph beginning with space are not identified.
Headings with large font are not taken care
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Noise
Sources of Noise
Quality of printer
Quality of scanner
Age of the document
Advantages of Noise removal
Improves recognition results.
Facilitates better processing in next stages.
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Experiments and Results
Experiments
EM algorithm with two clusters
EM algorithm with three clusters
K-means clustering with two clusters
Results
Method Noise Clusters removed Text clusters removed
EM 2-cluster 70% 0.01%
Table: Noise Removal statistics
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Example
Before Noise removal After Noise removal
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Example
Before Noise removal After Noise removal
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Observations
Observations
Underlines and hand written characters are not removed
Non-Telugu characters are not completely removed
In some documents border noise is not removed
Joint Telugu characters are removed
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Future Work
Planned Improvements:
Including more features like thickness of character
Including the structural features of character shapes
Including position feature to remove border noise
Trying to implement methods to find overlapping characters
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Conclusion
Created end to end functioning basic OCR.
Binarization with global thresholding giving 90% aboveaccuracy.
Able to correct skew angle of ±20
Able to remove above 70% noise from document
Able to segment lines with more than 95% accuracy
Able to classify characters with more than 30% accuracy
Able to render the characters by finding matching Unicode
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Future Work
Future Work
Consider more structural features for improving accuracy andbetter noise removal.
Apply CRF for line segmentation
Will use language models for correcting confusion charactersand broken characters
Improve classification models.
Implement page layout analysis module
Implement a web based Telugu OCR
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Rao P. V. S. and T. M. Ajitha 1995 Telugu Script Recognition - a Feature BasedApproach. Proce.of ICDAR, IEEE pgs.323-326,.
Pujari Arun K , C Dhanunjaya Naidu B C Jinaga 2002 An Adaptive CharacterRecognizer for Telugu Scripts using Multiresolution Analysis and AssociativeMemory. ICVGIP, Ahmedabad.
Negi Atul, Chakravarthy Bhagvati and.Krishna B 2001 An OCR system forTelugu. Proc. Of 6th Int. Conf. on Document Analysis and Recognition IEEEComp. Soc. Press, USA,. Pgs. 1110-1114.
Lakshmi C V, C Patvardhan 2003 A high accuracy OCR for printed Telugu text.
Conference on Convergent Technologies for Asia-Pacific Region (TENCON
)Volume 2, Issue, 15-17 Page(s): 725 - 729
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay
Towards e-Library of Telugu literature
Outline Introduction Work done in stage1 Binarization and Skew detection Segmentation and Noise removal Line Segmentation Classification Post Processing Conclusion and Future Work
Thank You
Rajesh Babu Arja (Roll no: 09305914) Guide : Prof. J.Saketha NathCo-Guide : Prof. Parag Chaudhuri Department of Computer Science and Engineering Indian Institute of Technology Bombay