ABEJA, Inc. (Researcher) - Deep Learning
- Computer Vision - Natural Language Processing - Graph Convolution / Graph Embedding
- Mathematical Optimization - https://github.com/TatsuyaShiraka
tech blog → http://tech-blog.abeja.asia/
Poincaré Embeddings Graph Convolution
We are hiring! → https://www.abeja.asia/recruit/
→ https://six.abejainc.com/
International Conference on Machine Learning
• Top ML Conference
• 434 orals in 3 days
• 9 parallel tracks
• Submitted 1629 papers
• 4 talks from invited speakers
• 9 tutorial talks
• 9(parallel)x3(sessions)x3(days)=81 sessions in main conference
ICML 2017 at Sydney
5
Schedule
16
8/6 Tutorial Session 9 tutorials (3 parallel)
8/7 Main Conference Day 1 27 sessions (9 parallel)
8/8 Main Conference Day 2 27 sessions (9 parallel)
8/9 Main Conference Day 3 27 sessions (9 parallel)
8/10 Workshop Conference Day 1 11 sessions (11 parallel)
8/11 Workshop Conference Day 2 11 sessions (11 parallel)
1/3
max attend
1/9
1/9
1/9
1/11
1/11
• Deep learning is still the biggest trend
• Autonomous vehicles
• Health care / computational biology
• Human interpretability and visualization
• Multitask learning for small data or hard tasks
• Reinforcement learning
• Imitation learning (inverse reinforcement learning)
• Language and speech processing
• GANs / CNNs / RNNs / LSTMs are default options
• RNNs and their variant
• Optimizations
• Online learning / bandit
• Time series modeling
• Applications Session
Some Trends (highly biased)
18
• Gluon is a new deep learning wrapper framework, which integrates dynamic dl frameworks (chainer, pytorch) and static dl frameworks (keras, mxnet) and get the best of the both worlds (hybridize)
• Great resources including many latest modelshttps://github.com/apache/incubator-mxnet/tree/master/example
• Looks easy to write
• Alex Smola was the presenter
• … not so fast yet ? ←
[Tutorial] Distributed Deep Learning with MxNet Gluon
19
http://www-bcf.usc.edu/~liu32/icml_tutorial.pdf
• RNN works well
• + pretraining (combine other clinics’ data)
• + expert defined features
• + new models for missing data
• CNN works well on image data and achieved super-human accuracy
• Some Features of Health Care Data
• Small sample size
• Missing values
• Medical domain knowledge
• Interpretation
• Use gradient boosting trees to mimic deep learning models (cool idea!)
• Hard to annotate even for experts
• Big Small Data
• Limited amount of data available to train age-specific or disease-specific models
[Tutorial] Deep Learning Models for Health Care: Challenges and Solutions
20
Future Directions: - Modeling heterogeneous data sources - Model interpretation - More complex output
“Interpretable Deep Models for ICU Outcome Prediction”, 2016
• Deep Neural Networks are “black boxes”.
• Sensitive analyses methods can be applied
• ex: Grad-CAM
[Tutorial] Interpretable Machine Learning
21
• Generating periodic patterns with GANs
• Local/Global/Periodic vectors
“Learning Texture Manifolds with the Periodic Spatial GAN”
22Example for many texture and many periodicity.
Local vectors
Global vectors
Periodic vectors
• Sequence revising with generative/Inference models
• Generative model P(x, y, z)=P(x, y|z)P(z)
• x : input seq., y: goodness of x, z: hidden var.
• Inference model P(z|x) , P(y|z)
• Input x0 -> infer z0 -> search better z (better F(z)) -> reconstruct x
“Sequence to better sequence: Continuous Revision of Combinatorial Structures”
23
• Gave a new algorithm and theoretical analysis for sum of norms (SON) clustering
• SON (2011)
• Assigning center to each data point and applied some regularization which magnetize centers
• Convex problem!
“Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery”
25
Image Compression using Deep Learning
• VAE(almost reconstruction) + GAN(refinement)
• Faster than jpeg on gpu, but several secs on cpu
“Real-Time Adaptive Image Compression”
26
• Subgoals
• Breaking up the problem Into Subgoals
• Learn sub-policies to achieve them
• StreetLearn
• Transfer Learning
• Progressive Neural Networks
• Distral: Robust Multitask Reinforcement Learning
[Invited Talk] “Towards Reinforcement Learning in the Complex World” - Raia Hadsell (Google Deep Mind)
27
• GANs are approximated by discrete distribution on some finite samples (with high probability)
• Sample size =
• P = discriminator size, ε = error
• “The birthday paradox” test
• Sample m images from generator
• See if there are duplicate images
• Estimate the sample size
“Generalization and Equilibrium in Generative Adversarial Nets” & “Do GANs actually learn the distribution? Some theory and empirics”
28
˜O(p log(p/✏)/✏2)
• Deterministic Rounding vs. Stochastic Rounding
• Theoretical explanation that SGD with stochastic rounding does not converge well
• Every updates are too noisy
• Won the Google Best Student Paper Award
“Towards a Deeper Understanding of Training Quantized Networks”
29
• RL produces much better sequence than log-likelihood based methods
• Why RL is so effective? (Beam Search Issues?)
“Sequence-Level Training of Neural Models for Visual Dialog”
30
• Google’s Expander which enhances broad range of tasks using graph structure
• smart reply, personal assistant
• image recognition
• Integrated framework for
• zero-shot/one-shot learning
• multi-modal learning
• semi-supervised learning
• multi-task learning
• “Neural Graph Machines”
• introduces graph regularization into DL
• Adjacent nodes (data) are constrained to have near vector representations
Neural Graph Learning
31