Top Banner
Area Attention Yang Li, Lukasz Kaiser, Samy Bengio, Si Si Google Research
12

Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Aug 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Area AttentionYang Li, Lukasz Kaiser, Samy Bengio, Si Si

Google Research

Page 2: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Neural Attentional Mechanisms

k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|

query

a1 a2 a3 a|M|

Page 3: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Neural Machine Translation

A B C D EOS X Y

a1 a2a3

a4

Bahdanau, Cho & Bengio, ICLR’15Luong, Pham, & Manning, ACL’15

Page 4: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Image Captioning

Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel & Bengio, ICML’15Sharma, Ding, Goodman & Soricut, ACL’18

EOS X Y

a1 a2 a3

a4

Image Grid Cells

Page 5: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Attention-Based Architectures

A B C D

Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin, NIPS’17

Transformer

Page 6: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|

query

a1 a2 a3 a|M|

Limitations

The unit of attention is predetermined rather than learned.

Airlines began charging for the first and second checked bags

A r e y o u a t o m eh ?

Word

Character

Image Grid Cell

Page 7: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Research Goal

Enable a model to attend to information at varying granularity. The unit of attention emerges from learning.

A r e y o u a t o m eh ?

Characters Words

Grid cells Objects

Airlines began charging for the first and second checked bags

Words Phrases

Page 8: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

1D Area Attention

original memory

area memory

query

1-item areas 2-item areas 3-item area3

Page 9: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

2D Area Attention

original memory

area memory

query

1x1 areas 1x2 areas

2x1 areas 2x2 areas

Page 10: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Features of Each Area

original memory

area memory

query

1x1 areas 1x2 areas

2x1 areas 2x2 areas

Area Features

Mean

Sum

Max

Standard deviation

Area shape, e.g., 2x2

original memory

area memory

query

1-item areas 2-item areas 3-item area3

Page 11: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Area Attention consistently Improves upon Transformer & LSTM

Transformer Machine Translation

LSTM Machine Translation

Transformer Image Captioning

Page 12: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation

Area Attention Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

Google Research

Poster sessionTue Jun 11th 06:30 — 09:00 PM @ Pacific Ballroom #27

Source code https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/area_attention.py