This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
A Tajik Extension of the MultilingualInformation Extraction System ZENON
Dr. Matthias HeckingTatjana Sarmina – BanevicieneFraunhofer Institute for Communication, Information Processing and Ergonomics FKIENeuenahrer Straße 2053343 WachtbergGermany
Dr. M. Hecking2. The Multilingual ZENON System - V
� Multilingual information extraction with the ZENONsystem, research prototype – not an operational system
� The information about the actions and named entitiesare identified from each sentence and the content ofthe sentences are formally represented in typed featurestructures.
� These structures can be combined and presented in agraphically navigatable Entity-Action-Network.
� (Partial) information extraction from English HUMINT reports from the KFOR deployment, Dari texts, and Tajik texts.
� Also: a word-to-word-translation to further support the analyst.
� GATE: "is one of the most widely used human language processing systems in the world.", "comprises an architecture, framework (or SDK) and graphical development environment ...", University of Sheffield since 1995
Dr. M. Hecking3. The Multilingual Tajik Extension - VIII
� The verbal phrases must be analyzed as a basis for the identification of the action.
� The Tajik VP-Chunker implements through JAPE rules the analysis of finite (present, past tense, perfect, past perfect) and non-finite verb phrases, participles, adverbs and negations (partial morphological analysis).
� In this presentation, we presented the functionality to perform information extraction for Tajik texts in the multilingual ZENON system.
� We expect that systems like ZENON will increase productivity of the intelligence analyst. He might analyze and combine information even from texts written in languages the analyst does not understand.
� Possible improvements
� greater coverage of grammatical phenomena of Tajik
� realize the recognition of action types and the combination of actions with their semantic roles
� A more general problem is to get the same coverage of linguistic data (e.g., dictionaries, grammars) and functionality (e.g., POS tagger, morphology analyzer) for rare languages (like Dari and Tajik).
� M. Hecking. Multilinguale Textinhaltserschließung auf militärischen Texten. In: Verteilte Führungsinformationssysteme. Michael Wunder, Jürgen Grosche (Hrsg.), Springer-Verlag, 2009.
� M. Hecking, C. Schwerdt. Multilingual Information Extraction for Intelligence Purposes. In: Proceedings of the 13th International Command and Control Research and Technolgy Symposium (ICCRTS), June 17-19, 2008, Bellevue, WA, U.S.A.
� M. Hecking. System ZENON – Semantic Analysis of Intelligence Reports. In: Proceedings of the LangTech 2008, February 28-29, 2008, Rome, Italy.
� C. Schwerdt. Analyse ausgewählter Verbalgruppen der Sprache Dari zur multilingualen Erweiterung des ZENON-Systems. FGAN, FKIE-Bericht Nr. 146, 2007.
� T. Sarmina-Baneviciene. Analyse spezifischer Probleme der tadschikischen Sprache zur multilingualen Erweiterung des ZENON-Systems. Fraunhofer FKIE, 2010 (forthcoming).