A Vietnamese Language Model Based on Recurrent Neural Network Viet-Trung Tran , Kiem-Hieu Nguyen, Duc-Hanh Bui Hanoi University of Science and Technology 1 Friday, October 7, 16
Apr 13, 2017
A Vietnamese Language ModelBased on Recurrent Neural NetworkViet-Trung Tran, Kiem-Hieu Nguyen, Duc-Hanh BuiHanoi University of Science and Technology
1Friday, October 7, 16
Outline
Statistical language model
Current state of the art
RNN for Vietnamese language model
Experimental results
Conclusion
2Friday, October 7, 16
Statistical language modelA probability distribution of word sequence
E.g. “go to the airport”
? = P(“airport”|“go to the”)
Applications:
Spelling checkers, smart keyboards
Enhance speed recognition/machine translation
LABAN KEY
3Friday, October 7, 16
ChallengesMeaningful
grammatically correct
understandable
Context-aware
E.g. I am from Vietnam. My mother-tongue is Vietnamese
Out of vocabulary
Slang, abbreviations, etc.
4Friday, October 7, 16
Common approachN-gram language model Katz's back-off: estimates the conditional probability of a word given its history in the n-gram
When trigram unavailable -> back-off to bi-gram -> uni-gram
SOURCE: HTTPS://EN.WIKIPEDIA.ORG/WIKI/KATZ%27S_BACK-OFF_MODEL5
Friday, October 7, 16
N-gram language model Only see a few words back Only predict words seen in the same context
6Friday, October 7, 16
Deep learning for NLPWord embedding
(SOCHER ET AL. (2013A))
MIKOLOV ET AL. (2013B).
7Friday, October 7, 16
Recurrent neural network for text
8
INPUT : GO TO THEOUTPUT : TO THE SCHOOLPROBABILITY (SCHOOL | GO TO THE)
Friday, October 7, 16
RNN vs. N-gramFoldable word context vs. fix n-gam contextPersonalization through continuous learningMore meaningful text suggestions Naturally support phrase, terms suggestions
9Friday, October 7, 16
RNN for Vietnamese language modelCharacter level language model
{previous characters} -> next charactersSyllable level language model
{previous syllables} -> next syllables
10Friday, October 7, 16
LSTM cell
SOURCE: HTTP://COLAH.GITHUB.IO/POSTS/2015-08-UNDERSTANDING-LSTMS/
11Friday, October 7, 16
Stacking multiple layers
12Friday, October 7, 16
Experiments 1,500 MOVIES - 2.056.308 SENTENCES
13Friday, October 7, 16
Experimental results
14Friday, October 7, 16
15Friday, October 7, 16
ConclusionFirst neural language model for VietnameseLargest experimental dataset Future work
Word embedding Neural net compressionConversational neural machine translation
16Friday, October 7, 16
Thank you for your attention
17Friday, October 7, 16
Conversational Chú hoài linh đẹp trai. Chú hoài linhChào buổi sángchị hát hay wa!! nghe thick a. chị khởi my ơi e rất la hâm mộchú hoài linh thật đẹp zai và chú Trấn thành đẹp qá
18Friday, October 7, 16
lịch sử ghi nhớ năm 1979tại hội nghị, đồng chí Phạm Ngọc Thủy Võ Văn Kiệttại hội nghị, đồng chí Hồ Chí Minh nóitại hội nghị, đồng chí Võ Nguyên Giáp và đồng chí Hồ Chí Minh đã ngồi ở tại đại hội Đảng lần thứ nhất vào năm 1945,Ngay từ những ngày đầu, Đúng như nhận xét của Giáo sư Nguyễn Văn Linh
19Friday, October 7, 16