This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644753 (KConnect). Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Czech Republic Apr 4th, 2017 – QT21 workshop, Valencia, Spain
24
Embed
Medical-domain Machine Translation in KConnect · Medical-domain Machine Translation in KConnect Pavel Pecina Charles University, Prague Faculty of Mathematics and Physics Institute
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644753 (KConnect).
Medical-domain Machine Translation in KConnect
Pavel PecinaCharles University, PragueFaculty of Mathematics and PhysicsInstitute of Formal and Applied LinguisticsCzech Republic
Apr 4th, 2017 – QT21 workshop, Valencia, Spain
Outline
● Context of the project (Khresmoi)
● Project details goals and objectives
● Role of MT in the project
● Industry requirements/constraints
● Solutions and tools
● Prototypes/Demos
● What is still needed
Khresmoi
● „Collect and make sense of biomedical information, then make it freely and easily available in several languages.“
● FP7-ICT, No. 257528, Collaborative project
● Total cost: EUR ~10M, 2010/09-2014/08
● Topic: ICT-2009.4.3 - Intelligent Information Management
● Coordinator: Henning Müller, University of Applied Sciences Western Switzerland, Sierre
● Effective automated information extraction from (unstructured) biomedical documents
● Linking information extracted from unstructured biomedical texts/images to structured information in knowledge bases
● Support of cross-language search, including multi-lingual queries, and returning machine-translated pertinent excerpts
● Adaptive user interfaces to assist in formulating queries and display search results via ergonomic/interactive visualizations
● Automated analysis and indexing for medical images
Khresmoi results (MT related)
● MT component to allow cross-lingual search and access
● Based on Moses and domain-adaptation techniques
● Deployed as (cloud-based) web-service
● Translation in two „modes“:– Translation of search queries from user languages to the
documents languages (query translation)– Translation of sentences from automaticaly created
summaries of medical documents (summary translation)
● Languages: Czech, German, French ↔ English
KConnect – a follow-up of Khresmoi
● „Development and commercialization of cloud-based services for multilingual Semantic Annotation, Semantic Search and Machine Translation of Electronic Health Records and medical publications.“
● H2020 project, No. 644753, Innovation action
● Total cost: EUR ~4M, 2015/02–2017/07
● Topic: ICT-15-2014 Big data and open data innovation and take-up
● Coordinator: Allan Hanbury, Technical University in Viena
– Technische Universitaet Wien (Austria) – coordination
– University of Sheffield (United Kingdom)
– King’s College London (United Kingdom)
– Charles University, Prague (Czech Republic)
● Industry:
– Findwise AB (Sweden)
– Precognox Informatikai Kft (Hungary)
– Ontotext AD (Bulgaria)
– Trip Database Ltd (United Kingdom)
– Health on the Net Foundation (Switzerland)
– Jonopkins Lan (Sweden)
KConnect objectives
● Productisation of the multilingual medical text processing tools developed in Khresmoi.
● Creating professional services community of companies trained to build solutions based on the KConnect Services.
● Development of toolkits for straightforward adaptation of the commercialised services to new languages.
● Adapting the services to Electronic Health Records processing, which is particularly challenging due to misspellings, neologisms, organisation-specific acronyms, etc.
● Languages: Hungarian, Polish, Spanish, Swedish ↔ English
MT Application Scenarios
1. Query translation– Translation of medical/health-related search queries from a
user language to the document language(s)– Queries usually non-grammatical, short sequences of terms– Lay-people queries vs. expert queries
2. Summary translation – Sentences taken from automaticaly created abstracts of
medical documents translated back to the user language– Usually longer, highly informative sentences