全脳アーキテクチャの会 カジュアルトーク#2 (2016.2.7) Convolutional Neural Networks のトレンド 全脳アーキテクチャの会 法政学学院 学研究科 修課程 島 樹
#2 (2016.2.7)
Convolutional Neural Networks
(SHIMADA Daiki)@sheema_sheema (Twitter)
M1
(!!)
20142
1
l CNN:
l CNN 26 !!
l ??
l
l CNN
Convolutional Neural Networks (CNN)
2
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
3
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
4
CNN
l
l 2
Neocognitron (1980) [1]
5
[1] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36, 1980.
l
l Back Propagation(BP)
LeNet (1998) [2]
[2] Y LeCun, L Bottou, Y Bengio, P Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 1998.
CNN ,,
l CNN
Ave./Max Pooling, Local Contrast Normalization (2009) [3]
6
[3] K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun. What is the best multi-stage architecture for object recognition?. CVPR, 2009.
l
ReLU (2011) [4]
[4] X. Glorot, A. Bordes, Y. Bengio. Deep Sparse Rectifier Neural Networks. AISTATS 11, 2011.
l
Dropout (2012) [5]
[5] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580, 2012.
CNN
l
l Data Augmentation (8)
AlexNet (2012) [6]
7
[6] A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012.
l
l (global ave. pooling)
Network in Network, global ave. pooling (2013) [7]
[7] M. Lin, Q. Chen, S. Yan. Network In Network. arXiv: 1312.4400, 2013.
CNN
l 19
l (3x3)
VGG-Net (2014) [8]
8
[8] K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Visual Recognition. arXiv: 1409.1556, 2014.
l 22
l auxiliary classifiers , Inception module
GoogLeNet / Inception (2014 ~ 2015) [9, 10]
[9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. arXiv: 1409.4842, 2014.
[10] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the Inception Architecture for Computer Vision. arXiv: 1512.00567, 2015.
CNN
l
l CNN
SPP-Net (2014) [11]
9
[11] K. He, X. Zhang, S. Ren, J. Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. arXiv: 1406.4729, 2014.
l 2
l guided BP
All Convolutional Net, guided BP (2014) [12]
[12] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller. Striving for Simplicity: The All Convolutional Net. arXiv: 1412.6806, 2014.
CNN
l Data Augmentation Exemplar CNN (2014) [13]
10
[13] A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, T. Brox. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks. arXiv: 1406.6909, 2014.
l CNN,,
Triplet Network (2014) [14]
[14] E. Hoffer, N. Ailon. Deep metric learning using Triplet network. arXiv: 1412.6622, 2014.
CNN
l
l
Batch Normalization (2015) [15]
11
[15] S. Ioffe, C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv: 1502.03167, 2015.
l 152
l
Residual Network; ResNet (2015) [16]
[16] K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. arXiv: 1512.03385, 2015.
AdaGrad [17]
RMSProp [18]
AdaDelta [19]
Adam [20]
12
[17] J. Duchi, E. Hazan, Y. Singer. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12 ,2011.
l (AdaGrad)
l
[18] T. Tieleman, G. Hinton. Divide the gradient by a run- ning average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2012.[19] M. D. Zeiler. ADADELTA: An Adaptive Learning Rate Method. arXiv: 1212.5701, 2012.[20] D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization. arXiv: 1412.6980, 2014.
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
13
CNN /
l DeconvolutionUnpooling
Deconvnet for visualizing
14
[21] M.D. Zeiler, and R. Fergus. Visualizing and understanding convolutional networks. arXiv,: 1311.2901, 2013.
CNN /
l
15
[22] A. Mahendran, A. Vedaldi. Understanding Deep Image Representations by Inverting Them. arXiv: 1412.0035, 2014.
CNN /
l CNN
l Adversarial example
CNN
16
[24] I. J. Goodfellow, J. Shlens, C. Szegedy. Explaining and Harnessing Adversarial Examples. arXiv: 1412.6572, 2014.
ostrich !! ostrich !!
[23] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, R. Fergus. Intriguing properties of neural networks. arXiv: 1312.6199, 2013.
CNN /
l
CNN
17
[25] A. Nguyen, J. Yosinski, J. Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. arXiv: 1412.1897, 2014.
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
18
l CVCNN
R-CNN (2013)
19
[26] R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524, 2013.
l 1 ()
l CNNROI (ROI Pooling)
l CV
Fast R-CNN (2015/4)
20
[27] R. Girshick. Fast R-CNN. arXiv:1504.08083, 2015.
l CNN (Region Proposal Net)
Faster R-CNN (2015/6)
21
[28] S. Ren, K. He, R. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:1506.01497, 2015.
l CNN
l Deconvolution
Fully Convolutional Networks (FCN)
22
[29] K. Simonyan, A. Vedaldi, A. Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv: 1312.6034, 2013.
l Pooling,
SegNet
23
[30] V. Badrinarayanan, A. Handa, R. Cipolla. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling. arXiv: 1505.07293, 2015.
l CRF
l CRFRNN(CRF-RNN)CNNCRF
CNN + (CRF)
24
[31] S. Zheng, S. Jayasumana, B. R. Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. H. S. Torr. Conditional Random Fields as Recurrent Neural Networks. arXiv: 1502.03240, 2015.
l /
l
Deep Mask
25
[32] P. O. Pinheiro, R. Collobert, P. Dollar. Learning to Segment Object Candidates. arXiv: 1506.06204, 2015.
l 3, CNN
l
Deep Face
26
[33] Y. Taigman, M. Yang, M. A. Ranzato and L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. CVPR, 2014.
l
Spatial Transformer Networks
27
[34] M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu. Spatial Transformer Networks. arXiv: 1506.02025, 2015.
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
28
l CNN
Deep Dream
29
[36] K. Simonyan, A. Vedaldi, A. Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv: 1312.6034, 2013.
[35] Inceptionism: Going Deeper into Neural Networks. http://googleresearch.blogspot.ch/2015/06/inceptionism-going-deeper-into-neural.html
l 3D
30
[37] A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko, T. Brox. Learning to Generate Chairs, Tables and Cars with Convolutional Networks. arXiv: 1411.5928, 2014.
l CNN
31
[38] L. A. Gatys, A. S. Ecker, M. Bethge. A Neural Algorithm of Artistic Style. arXiv: 1508.06576, 2015.
1
5
l CNNMRF
32
[39] C. Li, M. Wand. Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis. arXiv:1601.04589, 2016.
l Adversarial Networks
DCGAN
33
[40] A. Radford, L. Metz, S. Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434, 2015.
l waifu2x[42]
Super-Resolution CNN (SRCNN)
34
[41] C. Dong, C. C. Loy, K. He, X. Tang. Image Super-Resolution Using Deep Convolutional Networks. arXiv:1501.00092, 2015.
[42] waifu2x. http://waifu2x.udp.jp/index.ja.html
l CNNmotion kernelMRF
Deblurring ()
35
[43] J. Sun, W. Cao, Z. Xu, J. Ponce. Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal. arXiv:1503.00593, 2015.
l hypercolumns [45]
Automatic Colorization CNN
36
[44] Automatic Colorization, http://tinyclouds.org/colorize/
[45] B. Hariharan, P. Arbelez, R. Girshick, J. Malik. Hypercolumns for Object Segmentation and Fine-grained Localization. arXiv: 1411.5752, 2014.
original CNN human(Reddit)
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
37
3D
l Selection Tower (depth)Color Tower () 2
Deep Stereo
38
[46] J. Flynn, I. Neulander, J. Philbin, N. Snavely. DeepStereo: Learning to Predict New Views from the World's Imagery. arXiv:1506.06825, 2015.
3D
Deep Stereo
39
[46] J. Flynn, I. Neulander, J. Philbin, N. Snavely. DeepStereo: Learning to Predict New Views from the World's Imagery. arXiv:1506.06825, 2015.
[47] DeepStereo: Learning to Predict New Views from the Worlds Imagery - YouTube, https://www.youtube.com/watch?v=cizgVZ8rjKA
3D
l CNN
40
[48] J. bontar, Y. LeCun. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. arXiv: 1510.05970, 2015.
3D
l CNNdepth, surface normal, semantic label
3D
41
[49] D. Eigen, R. Fergus. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. arXiv: 1411.4734, 2014.
input Eigen et al. proposal ground truth
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
42
l 487(!?), Top-580
l CNN ()
43
[50] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, F. Li. Large-scale Video Classification with Convolutional Neural Networks. CVPR, 2014.
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
44
l Memorability:
l Memorability score: LaMem
MemNet: CNN for Memorability
45
[51] LaMem, http://memorability.csail.mit.edu/
[52] A. Khosla, A. S. Raju, A. Torralba and A. Oliva. Understanding and Predicting Image Memorability at a Large Scale. ICCV, 2015..
Memorability
l MemorabilityCNN
l Rank Correlation: 0.64(MemNet) v.s. 0.68(human)
MemNet: CNN for Memorability
46
[51] LaMem, http://memorability.csail.mit.edu/
[52] A. Khosla, A. S. Raju, A. Torralba and A. Oliva. Understanding and Predicting Image Memorability at a Large Scale. ICCV, 2015..
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
47
l
l CNN() + LSTM(; )
48
Google NIC [53] LRCN [54][53] O. Vinyals, A. Toshev, S. Bengio, D. Erhan. Show and Tell: A Neural Image Caption Generator. arXiv: 1411.4555, 2014./
[54] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. arXiv: 1411.4389, 2014.
(: Google NIC, : LRCN)
49
l
(Visual Turing Test)
50
mQA [55]
Neural-Image QA [56]
[55] H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, W. Xu. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering. arXiv: 1505.05612, 2015.
[56] M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images. ICCV, 2015.
(Visual Turing Test)
51
mQA [55]
(Visual Turing Test)
52Neural-Image QA [56]
DAQUAR
(?)
l Bidirectional RNN, RNN
53
[57] E. Mansimov, E. Parisotto, J. L. Ba, R. Salakhutdinov. Generating Images from Captions with Attention. arXiv: 1511.02793, 2015.
54
[58] R. Kiros, R. Salakhutdinov, R. S. Zemel. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv: 1411.2539, 2014.
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ? ImageNet ...
55
CNN
l Q-Learning CNN (DQN)
l
Atari 2600 (Deep Q-Networks)
56[60] V. Mnih, at al. Human-level control through deep reinforcement learning. nature, 2015.
[59] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602, 2013.
CNN
l 2(&)(MCTS)
l 19x19CNN
l -> self-play
AlphaGo
57
[61] D. Silver, et al. Mastering the game of Go with deep neural networks and tree search. nature, 2016.
CNN
l AI
l 55, 3
AlphaGo
58
[61] D. Silver, et al. Mastering the game of Go with deep neural networks and tree search. nature, 2016.[62] Y. Tian, Y. Zhu. Better Computer Go Player with Neural Network and Long-term Prediction. arXiv: 1511.06410, 2015.
CNN
l DQN
l 1 16 actor-learner threads
(Asynchronous DQN)
59
[63] V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. arXiv:1602.01783, 2016.
#2Convolutional Neural Networks1. CNN / 2. / 3. 4. 5. 3D6. 7. 8. 9. CNN10. Whats Next ?
60
Whats Next ?
l Fei-Fei Li
Visual Genome
61
[64] Visual Genome, https://visualgenome.org/
108,249 images
4.2 million Region Descriptions
1.7 million Visual Q&A
2.1 Million Object Instances(75,729 unique objects)
1.8 Million Attributes(40,513 unique attributes)