Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Area AttentionYang Li, Lukasz Kaiser, Samy Bengio, Si Si

Google Research

Neural Attentional Mechanisms

k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|

query

a1 a2 a3 a|M|

Neural Machine Translation

A B C D EOS X Y

a1 a2a3

a4

Bahdanau, Cho & Bengio, ICLR’15Luong, Pham, & Manning, ACL’15

Image Captioning

Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel & Bengio, ICML’15Sharma, Ding, Goodman & Soricut, ACL’18

EOS X Y

a1 a2 a3

a4

Image Grid Cells

Attention-Based Architectures

A B C D

Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin, NIPS’17

Transformer

k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|

query

a1 a2 a3 a|M|

Limitations

The unit of attention is predetermined rather than learned.

Airlines began charging for the first and second checked bags

A r e y o u a t o m eh ?

Word

Character

Image Grid Cell

Research Goal

Enable a model to attend to information at varying granularity. The unit of attention emerges from learning.

A r e y o u a t o m eh ?

Characters Words

Grid cells Objects

Airlines began charging for the first and second checked bags

Words Phrases

1D Area Attention

original memory

area memory

query

1-item areas 2-item areas 3-item area3

2D Area Attention

original memory

area memory

query

1x1 areas 1x2 areas

2x1 areas 2x2 areas

Features of Each Area

original memory

area memory

query

1x1 areas 1x2 areas

2x1 areas 2x2 areas

Area Features

Mean

Sum

Max

Standard deviation

Area shape, e.g., 2x2

original memory

area memory

query

1-item areas 2-item areas 3-item area3

Area Attention consistently Improves upon Transformer & LSTM

Transformer Machine Translation

LSTM Machine Translation

Transformer Image Captioning

Area Attention Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

Google Research

Poster sessionTue Jun 11th 06:30 — 09:00 PM @ Pacific Ballroom #27

Source code https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/area_attention.py

Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Documents