Broadcast Technology No.58, Autumn 2014 ● C NHK STRL 21 N HK’s Web service called “NEWSWEB EASY” (http://www.nhk.or.jp/ news/easy/) provides news stories written in simplified Japanese text for non-native residents in Japan and children. Each item is a rewrite of a news script originally written in ordinary Japanese language and various hints are attached in order to enhance readers’ comprehension. STRL supports the production of the service through its research on natural language processing tech- nologies. We recently constructed a new production support system (Figure). NEWSWEB EASY presents news in simple Japanese text, and its articles are written through a col- laboration between an experienced news writer who edits the original news script into simpler composi- tions and a Japanese language instructor (specializing in teaching Japanese language to non-Japanese speakers) who paraphrases unfa- miliar expressions or complicated sentence structures to fit the target readers. For these rewriters’ assis- tance in replacing all the difficult words with more appropriate ones, our system color-codes every word in the manuscript they are writing to indicate the vocabu- lary type (e.g., biographical/geo- graphical names) and difficulty level. The system then tags each word in the finished rewrite with various information. After manu- ally correcting errors in automatic tagging, the simplified Japanese news items are published with the hints generated from the tags, such as pronunciations of Kanji words and plain explanations of difficult words. The automatic tagging technol- ogy is based on machine learning, which acquires knowledge from manually corrected tags made in the daily production. Our new technology has approximately 95% automatic tagging accuracy. This means only small manual error corrections are needed on the auto- matically generated tags. Since the system learns from manually cor- rected tags incrementally (stream learning), it increases its knowl- edge and accuracy day by day. Our system has a number of functions, such as one to search for past rewrites, and it has proven to be useful in daily production. NEWSWEB EASY currently pub- lishes about five news items daily, and we are conducting various studies, including ones on how to assist rewriters in different ways, aimed at making it easier for the service to provide more simplified Japanese news items. Natural Language Processing Technology to Support Simplified Japanese News Service “NEWSWEB EASY” Tadashi Kumano, Human Interface Research Division 日本 は 大久保 嘉人 選手 など place name basic word person name person name semi-difficult word basic word にっぽん おおくぼ よしと せんしゅ 「にほん」ともいい… 競技に出るために .. おもな例をあげて… Word Type/difficulty Pronunciation Explanation NEWSWEB EASY online page Word tagging knowledge Automatic tagging Vocabulary type/ difficulty color-coding Difficulty level confirmation screen Tag editor Errors are corrected manually Incremental learning from corrected tags Pronunciation Explanation of difficult vocabularies Color-coding geographical biographical names Checks manuscript currently being edited Rewrite work Tag work Original news script ワールドカップ 日本は 0-0 でギリ シャと引き分け 20 日、サッカーのワールドカップで 日本はギリシャと試合をしました。 日本は 15 日の試合でコートジボワー ルに負けました。ギリシャにも負ける と、決勝トーナメント (= 上から 16 番 目までのチームが優勝を決めるために 行う試合 ) に出ることができません。 必ず勝ちたいと考えて、日本は大久保 嘉人選手などを試合の最初から出しま した。 Figure: NEWSWEB EASY production support using automatic tagging technology