Area Attention Yang Li, Lukasz Kaiser, Samy Bengio, Si Si Google Research
Area AttentionYang Li, Lukasz Kaiser, Samy Bengio, Si Si
Google Research
Neural Attentional Mechanisms
k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|
query
a1 a2 a3 a|M|
Neural Machine Translation
A B C D EOS X Y
a1 a2a3
a4
Bahdanau, Cho & Bengio, ICLR’15Luong, Pham, & Manning, ACL’15
Image Captioning
Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel & Bengio, ICML’15Sharma, Ding, Goodman & Soricut, ACL’18
EOS X Y
a1 a2 a3
a4
Image Grid Cells
Attention-Based Architectures
A B C D
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin, NIPS’17
Transformer
k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|
query
a1 a2 a3 a|M|
Limitations
The unit of attention is predetermined rather than learned.
Airlines began charging for the first and second checked bags
A r e y o u a t o m eh ?
Word
Character
Image Grid Cell
Research Goal
Enable a model to attend to information at varying granularity. The unit of attention emerges from learning.
A r e y o u a t o m eh ?
Characters Words
Grid cells Objects
Airlines began charging for the first and second checked bags
Words Phrases
1D Area Attention
original memory
area memory
query
1-item areas 2-item areas 3-item area3
2D Area Attention
original memory
area memory
query
1x1 areas 1x2 areas
2x1 areas 2x2 areas
Features of Each Area
original memory
area memory
query
1x1 areas 1x2 areas
2x1 areas 2x2 areas
Area Features
Mean
Sum
Max
Standard deviation
Area shape, e.g., 2x2
original memory
area memory
query
1-item areas 2-item areas 3-item area3
Area Attention consistently Improves upon Transformer & LSTM
Transformer Machine Translation
LSTM Machine Translation
Transformer Image Captioning
Area Attention Yang Li, Lukasz Kaiser, Samy Bengio, Si Si
Google Research
Poster sessionTue Jun 11th 06:30 — 09:00 PM @ Pacific Ballroom #27
Source code https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/area_attention.py