1 AFNLP 2008 Meeting Indonesia Country Report Hammam Riza [email protected] Agency for the Assessment and Application of Technology (BPPT) Ministry of Research and Technology Republic of Indonesia
Dec 17, 2015
1
AFNLP 2008 MeetingIndonesia Country Report
Hammam Riza [email protected] for the Assessment and Application of Technology (BPPT)Ministry of Research and TechnologyRepublic of Indonesia
TOC
Past Activities Activities in 2007 Activities Plan 2008, 2009 National Language Year 2008
3
Past NLP Research Projects in Indonesia
Indonesian Text-To-Speech (BPPT, ITB, UI) GDA/MMA/Linguistic-DS MPEG-7 (Multimedia Annotation) Cross-Linguistic Portal (dictionaries, corpus, tools) Web translator (WebTRans) Standard Indonesian Language Corpus (SILC) Indonesian Language Dictionaries Project (KBBI) English-Indonesia Parallel Corpus (INCI) Speech recognition/synthesis system (Bandung Institute of
Technology/ Telkom RDC/University of Indonesia) Information retrieval (ITB and University of Indonesia) Text/Image processing tools (Gajah Mada University) Computational lexicon (National Language Center) Computational morphology (Atmajaya University)
4
Promotion of Language Technologies (2007)
National Language Congress XII in Solo introducing toolkit to build speech database for endangered languages and Atmajaya Language Workshop (June 2007) in Jakarta on promoting local computing policy and speech technologies (both keynote speeches by Dr. Hammam Riza)
Promotion of Context Sensitive Dictionary Project for Speech Translation Corpus for Aceh Tsunami Region; (Indonesian-Acehnese, bidirectional)
5
Activities in Machine Translation (2006-2007)
Rule-based system Indonesian-English translator (started in 2006) was launched to the market June 2007 by ITB
This translator is combined with English TTS (Windows), and Indonesian TTS (proprietary)
Experiment of Statistical MT – using Pharaoh decoder (Eng-Indo parallel corpus) by
6
Current Activities in Speech Tech
• Telkom RDC & BPPT collaboration on Speech Recognition and Summarization
• Indonesia Goes Open Source (IGOS) speech recognition system (funded by Ministry of Research and Technology)
• Speech recognition system for Bahasa Indonesia (University of Indonesia)
– Transcribing speech data that contains broadcast TV and Radio news
– Applications:• sending short message service (sms) • IVR ( health and tourism services)
• Research for “intonation by example” and “automatic prosody pattern extractor” using Artificial Neural Network (ANN)
• Text to Speech system for local languages (ITB/UI)
100th Year of Bahasa Indonesia – National Language Year 2008
Series of event culminating at the International Conference on Bahasa Indonesia (Oct 2008) Importance of Indonesian – Its roles, functions in national life &
development (policy making, business, media, education) Language planning (shaping change)
6 keynote speakers from AFNLP will be invited by Indonesian government through out the year
8
Major Activities for 2008 Local Language Resource Projects (Language Center) Indonesian and Local Languages - Wordnet MALINDO (Malaysia-Indonesia) joint projects Speech to speech translation for Asian languages (A-
STAR) Speech database Telkom RDC/BPPT (APT support) Language Resources and Translation English -
Indonesia (collaboration with PAN Localization) Speech Corpus for Local Languages (Endangered
Languages) – using BLARK (ELDA)
9
Activities Plan for 2008-2009
Speech Recognition and Phrase-based Statistical Machine Translation (SMT) system for bidirectional Indonesian-English and Indonesian-Japanese
Mapping and SMT for Indonesian-Regional Languages (Bahasa Nusantara) and for German, French, Chinese and Arabic (cross border languages)
Information Retrieval (cross language speech retrieval) Searching and retrieving Indonesian speech data
Topic Detection and Tracking (TDT) Identifying topics in speech data collection Classifying new data to the existing topics in the collection
Speech Synthesis Speech Summarization
Summarize the Indonesian speech documents
E-dictionary projectNational Language Center
Size & Comprehensiveness: 200,000 entries many subject areas are covered
Method: corpus-based, primary data for largest print dict
Usefulness: find the words you need definitions and examples are helpful
Users writers, journalists, editors, scientists,
academics, teachers, students, business people, lawyers etc…
Kamus Besar Bahasa Indonesia (KBBI) 3rd ed.
Echols & Shadily’s Eng-Ind. dictionary.
In Indonesia, there are at least 13 biggest local languages with at least one million speakers
Javanese (75,200,000) Sundanese (27,000,000) Malay (20,000,000)Madurese (13,694,000) Minangkabau (6,500,000) Batak (5,150,000) Buginese (4,000,000) Balinese (3,800,000) Acehnese (3,000,000) Sasak (2,100,000) Makassarese (1,600,000) Lampung (1,500,000)Rejang (1,000,000)
ACEH – 32 local languages
EAST JAVA – 6 local languages
LOCAL & CROSS-BORDER LANGUAGES
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
bn id kh la my mm ph sg th tp vn
South East Asia
% Local Languages % English % Other Cross Boader Languages
Note:Cross-Border Languages in Indonesia:English, Arabic, Chinese, French, German, Dutch, Japanese, etc.
Language Digital DivideLanguage Preservation
Survey of indigenous local languages Local computing policy will be
developed for major local languages Endangered languages are identified
and preserved by means of ICT Language resources collection for
official and major local languages
Thank YouAny comments please mail