Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages Computer Science Department, Old Dominion University Norfolk, Virginia - 23529 Sawood Alam National University of Sciences and Technology Islamabad, Pakistan Fateh ud din B Mehmood Computer Science Department, Old Dominion University Norfolk, Virginia - 23529 Michael L. Nelson
30
Embed
Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Improving Accessibility ofArchived Raster Dictionaries of
Complex Script Languages
Computer Science Department, Old Dominion UniversityNorfolk, Virginia - 23529
Sawood Alam
National University of Sciences and TechnologyIslamabad, Pakistan
Fateh ud din B Mehmood
Computer Science Department, Old Dominion UniversityNorfolk, Virginia - 23529
OK Google, Define Dictionarya book or electronic resource that liststhe words of a language (typically inalphabetical order) and gives theirmeaning, or gives the equivalent wordsin a different language, often alsoproviding information aboutpronunciation, origin, and usage.
Dictionaries Are DifferentRead: random accessWrite: maintain sort orderThe most compact mode topreserve a language
Unicode CollationOrdered assembly of written informationUnicode values != natural collationArabic script: U+0600 to U+06FFOut of order alphabets in derived languagesCommon Locale Data Repository (CLDR)
Morphological derivationDerived word simplification
Radicals and strokes (Chinese)
Indexing: Ordered Pages
Indexing: Sparse Index
Indexing: Full Index
Indexing: Location Index
Indexing State Transition
Annotation
Digitization
Dictionary ExplorerMultilingual Multi-dictionary LookupSearching and ExploringAnnotation and digitizationUser Contribution and FeedbackOpen Source => GitHub:/urduweb/DictionaryExplorer
* 75,000 words, phrases, proverbs, and idioms** 13 contributors
Prefix Permutations
Prefix: One
Prefix: Two
Prefix: Three
Prefix: Four
Prefix: Five
Prefix: Six
Conclusions and Future WorkIdentified issues
Too many matchesLack of fielded searchingLack of OCR supportNo input method assistance
Collation chalangesAccessibility levels: Ordered Pages, Sparse, Full, andLocation indexes, annotation, and digitizationImplemented a multi-lingual multi-dictionary explorerEffort and prefix evaluationIn future: elastic index and automatic region estimsteGitHub:/urduweb/DictionaryExplorer