Top Banner
ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTE Anima Anandkumar Bren Professor at Caltech Director of ML Research at NVIDIA
46

ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTE

Anima Anandkumar

Bren Professor at CaltechDirector of ML Research at NVIDIA

Page 2: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

ALGORITHMS • OPTIMIZATION• SCALABILITY• MULTI-DIMENSIONALITY

DATA• COLLECTION• AGGREGATION • AUGMENTATION

INFRASTRUCTUREFULL STACK FOR ML• APPLICATION SERVICES• ML PLATFORM• GPUS

TRINITY FUELING ARTIFICIAL INTELLIGENCE

Page 3: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

• COLLECTION: ACTIVE LEARNING, PARTIAL LABELS..

• AGGREGATION: CROWDSOURCING MODELS..

• AUGMENTATION: GENERATIVE MODELS, SYMBOLIC EXPRESSIONS..

DATA

Page 4: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

ACTIVE LEARNING

Labeled data

Unlabeled data

Goal• Reach SOTA with a smaller dataset

• Active learning analyzed in theory• In practice, only small classical models

Can it work at scale with deep learning?

Page 5: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

TASK: NAMED ENTITY RECOGNITION

Page 6: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

RESULTSNER task on largest open benchmark (Onto-notes)

Active learning heuristics:

• Least confidence (LC)• Max. normalized log

probability (MNLP)

• Deep active learning matches : • SOTA with just 25% data on English, 30% on Chinese.• Best shallow model (on full data) with 12% data on English, 17% on Chinese.

Test F1 score vs. % of labeled words

English

Published as a conference paper at ICLR 2018

0 20 40 60 80

70

75

80

85

Percent of words annotated

Test

F1sc

ore

MNLPLC

RANDBest Deep Model

Best Shallow Model

(a) OntoNotes-5.0 English

0 20 40 60 80 100

65

70

75

Percent of words annotated

Test

F1sc

ore

MNLPLC

RANDBest Deep Model

Best Shallow Model

(b) OntoNotes-5.0 Chinese

Figure 1: F1 score on the test dataset, in terms of the number of words labeled.

Figure 2: Genre distribution of top 1,000 sentenceschosen by an active learning algorithm

Detection of under-explored genres To bet-ter understand how active learning algorithmschoose informative examples, we designed thefollowing experiment. The OntoNotes datasetsconsist of six genres: broadcast conversation(bc), braodcast news (bn), magazine genre (mz),newswire (nw), telephone conversation (tc), we-blogs (wb). We created three training datasets:half-data, which contains random 50% of theoriginal training data, nw-data, which containssentences only from newswire (51.5% of wordsin the original data), and no-nw-data, which isthe complement of nw-data. Then, we trainedCNN-CNN-LSTM model on each dataset. Themodel trained on half-data achieved 85.10 F1,significantly outperforming others trained on bi-ased datasets (no-nw-data: 81.49, nw-only-data:82.08). This showed the importance of good genre coverage in training data. Then, we analyzed thegenre distribution of 1,000 sentences MNLP chose for each model (see Figure 2). For no-nw-data,the algorithm chose many more newswire (nw) sentences than it did for unbiased half-data (367 vs.217). On the other hand, it undersampled newswire sentences for nw-only-data and increased theproportion of broadcast news and telephone conversation, which are genres distant from newswire.Impressively, although we did not provide the genre of sentences to the algorithm, it was able toautomatically detect underexplored genres.

REFERENCES

Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streamingsubmodular maximization: Massive data summarization on the fly. In Proceedings of the 20th

ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 671–680.ACM, 2014.

Jason PC Chiu and Eric Nichols. Named entity recognition with bidirectional lstm-cnns. Transactions

of the Association for Computational Linguistics, 4:357–370, 2016.

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa.Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537, 2011.

David Graff and Christopher Cieri. English gigaword, ldc catalog no. LDC2003T05. Linguistic Data

Consortium, University of Pennsylvania, 2003.

3

Published as a conference paper at ICLR 2018

0 20 40 60 80

70

75

80

85

Percent of words annotated

Test

F1sc

ore

MNLPLC

RANDBest Deep Model

Best Shallow Model

(a) OntoNotes-5.0 English

0 20 40 60 80 100

65

70

75

Percent of words annotated

Test

F1sc

ore

MNLPLC

RANDBest Deep Model

Best Shallow Model

(b) OntoNotes-5.0 Chinese

Figure 1: F1 score on the test dataset, in terms of the number of words labeled.

Figure 2: Genre distribution of top 1,000 sentenceschosen by an active learning algorithm

Detection of under-explored genres To bet-ter understand how active learning algorithmschoose informative examples, we designed thefollowing experiment. The OntoNotes datasetsconsist of six genres: broadcast conversation(bc), braodcast news (bn), magazine genre (mz),newswire (nw), telephone conversation (tc), we-blogs (wb). We created three training datasets:half-data, which contains random 50% of theoriginal training data, nw-data, which containssentences only from newswire (51.5% of wordsin the original data), and no-nw-data, which isthe complement of nw-data. Then, we trainedCNN-CNN-LSTM model on each dataset. Themodel trained on half-data achieved 85.10 F1,significantly outperforming others trained on bi-ased datasets (no-nw-data: 81.49, nw-only-data:82.08). This showed the importance of good genre coverage in training data. Then, we analyzed thegenre distribution of 1,000 sentences MNLP chose for each model (see Figure 2). For no-nw-data,the algorithm chose many more newswire (nw) sentences than it did for unbiased half-data (367 vs.217). On the other hand, it undersampled newswire sentences for nw-only-data and increased theproportion of broadcast news and telephone conversation, which are genres distant from newswire.Impressively, although we did not provide the genre of sentences to the algorithm, it was able toautomatically detect underexplored genres.

REFERENCES

Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streamingsubmodular maximization: Massive data summarization on the fly. In Proceedings of the 20th

ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 671–680.ACM, 2014.

Jason PC Chiu and Eric Nichols. Named entity recognition with bidirectional lstm-cnns. Transactions

of the Association for Computational Linguistics, 4:357–370, 2016.

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa.Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537, 2011.

David Graff and Christopher Cieri. English gigaword, ldc catalog no. LDC2003T05. Linguistic Data

Consortium, University of Pennsylvania, 2003.

3

Chinese

Page 7: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

• Uncertainty sampling works. Normalizing for length helps under low data.

• With active learning, deep beats shallow even in low data regime.

• With active learning, SOTA achieved with far fewer samples.

TAKE-AWAY

Page 8: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

ACTIVE LEARNING WITH PARTIAL FEEDBACK

images

questions dog?

dog

non-dogpartiallabels

• Hierarchical class labeling: Labor proportional to # of binary questions asked

• Actively pick informative questions ?

Page 9: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

RESULTS ON TINY IMAGENET (100K SAMPLES)

• Yield 8% higher accuracy at 30% questions (w.r.t. Uniform)

• Obtain full annotation with 40% less binary questions

ALPF-ERCactive data

activequestions

Uniforminactive data

inactive questions

AL-MEactive data

inactive questions

AQ-ERCinactive data

active questions

0.1

0.2

0.3

0.4

0.5

0% 25% 50% 75% 100%

Accuracy vs. # of QuestionsUniform AL-ME AQ-ERC ALPF-ERC

- 40%

+8%

Page 10: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

• Don’t annotate from scratch

• Select questions actively based on the learned model

• Don’t sleep on partial labels

• Re-train model from partial labels

TWO TAKE-AWAYS

Page 11: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

CROWDSOURCING: AGGREGATION OF CROWD ANNOTATIONS

Majority rule• Simple and common.• Wasteful: ignores annotator

quality of different workers.

Annotator-quality models• Can improve accuracy.• Hard: needs to be estimated

without ground-truth.

Page 12: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

PROPOSED CROWDSOURCING ALGORITHM

Repeat

Posterior of ground-truth labels given annotator quality model

Use trained model to infer ground-truth labels

Noisy crowdsourced annotations

MLE : update Annotator quality using inferred labels from model

Training with weighted loss. Use posterior as weights

Page 13: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

LABELING ONCE IS OPTIMAL: BOTH IN THEORY AND PRACTICE

MS-COCO dataset. Fixed budget: 35k annotations

No. of workers

Theorem: Under fixed budget, generalization error minimized with single annotation per sample.

Assumptions: • Best predictor is accurate enough

(under no label noise).• Simplified case: All workers

have same quality. • Prob. of being correct > 83%

5% wrt Majority rule

Page 14: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

DATA AUGMENTATION 1: GENERATIVE MODELING

Merits• Captures statistics of natural images

• Learnable

Peril• Feedback is real vs. fake: different from prediction.• Introduces artifacts

GAN

Page 15: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

PREDICTIVE VS GENERATIVE MODELS

y

x

y

x

P(y | x) P(x | y)

One model to do both?

• SOTA prediction from CNN models.• What class of p(x|y) yield CNN models for p(y|x)?

Page 16: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

NEURAL DEEP RENDERING MODEL (NRM)

...

...

object category

intermediaterendering

image

latent variables

x<latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg==</latexit><latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg==</latexit><latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg==</latexit><latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg==</latexit>

y<latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQQ=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0II/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW//4PCoenzS0XGqGLZZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHssHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sAA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit><latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQQ=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0II/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW//4PCoenzS0XGqGLZZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHssHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sAA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit><latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQQ=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0II/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW//4PCoenzS0XGqGLZZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHssHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sAA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit><latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQQ=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0II/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW//4PCoenzS0XGqGLZZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHssHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sAA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit>

Design joint priors for latent variables based on reverse-engineering CNN predictive architectures

Page 17: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

NEURAL RENDERING MODEL (NRM)

0.5dog0.2cat0.1horse…

1.0dog

Chooserenderornot

Upsample,selectlocation

RenderNRM:Generation

CNN:Inference

image unpooledfeaturemap

pooledfeaturemap

rectifiedfeaturemap

classtemplate

maskedtemplate

upsampledtemplate

renderedimage

Page 18: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

MAX-MIN CROSS-ENTROPY ➡ MAX-MIN NETWORKS

Cross-Entropy Loss for Training the CNNs with Labeled Data

min✓2A�

Hp,q(y|x, zmax) � min(zi)ni=1,✓

1

n

nX

i=1

� log p(yi|xi, zi; ✓)<latexit sha1_base64="o8aAYTACJXrjfyUfzuzzjiD9z6U=">AAACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQQrWdxVJq5JZXhVDDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZiim4uyWoApoF0PLHjFqBGNB4RW3L8V6JJ454xfYs+bEN/+8l1wur8Xe/z1YDD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi//Aa9kMoP</latexit><latexit sha1_base64="o8aAYTACJXrjfyUfzuzzjiD9z6U=">AAACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQQrWdxVJq5JZXhVDDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZiim4uyWoApoF0PLHjFqBGNB4RW3L8V6JJ454xfYs+bEN/+8l1wur8Xe/z1YDD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi//Aa9kMoP</latexit><latexit sha1_base64="o8aAYTACJXrjfyUfzuzzjiD9z6U=">AAACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQQrWdxVJq5JZXhVDDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZiim4uyWoApoF0PLHjFqBGNB4RW3L8V6JJ454xfYs+bEN/+8l1wur8Xe/z1YDD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi//Aa9kMoP</latexit><latexit sha1_base64="o8aAYTACJXrjfyUfzuzzjiD9z6U=">AAACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQQrWdxVJq5JZXhVDDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZiim4uyWoApoF0PLHjFqBGNB4RW3L8V6JJ454xfYs+bEN/+8l1wur8Xe/z1YDD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi//Aa9kMoP</latexit>

Max-Min Loss for Training the CNNs with Labeled Data

↵max

Hp,q(y|x, zmax) + ↵min

Hp,q(y|x, zmin)<latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit><latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit><latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit><latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit>

InputImageMax

Xentropy

MinXentropy

Max-MinXentropy

Sharedweights

• Max cross-entropy maximizes the posteriors of correct labels. Min cross-entropy minimizes the posteriors of incorrect labels.

• Co-learning: Max and Min networks try to learn from each other

Page 19: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

STATISTICAL GUARANTEES FOR THE NRM

Bound on the generalization errorRisk ≤

!"#$%&()*+,-.%&%/0%&-/12*,34/5/7

• Rendering path normalization: • new form of regularization

Training loss in the CNNs equivalent to likelihood in NRM

Max-Min NRM with RPN achieves SOTA on benchmarks

Rendering Path

Page 20: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

DATA AUGMENTATION 2: SYMBOLIC EXPRESSIONS

Goal: Learn a domain of functions (sin, cos, log, add…)

• Training on numerical input-output does not generalize.

Data Augmentation with Symbolic Expressions

• Efficiently encode relationships between functions.

Solution:

• Design networks to use both: symbolic + numerical

Page 21: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

ARCHITECTURE : TREE LSTM

sin; 𝜃 + cos; 𝜃 = 1 sin −2.5 = −0.6

• Symbolic expression trees. Function evaluation tree.

• Decimal trees: encode numbers with decimal representation (numerical).

• Can encode any expression, function evaluation and number.

DecimalTreefor2.5

Page 22: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

RESULTS

• Vastly Improved numerical evaluation: 90% over function-fitting baseline.

•Generalization to verifying symbolic equations of higher depth

• Combining symbolic + numerical data helps in better generalization

for both tasks: symbolic and numerical evaluation.

LSTM: Symbolic TreeLSTM: Symbolic TreeLSTM: symbolic + numeric

76.40 % 93.27 % 96.17 %

Page 23: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

• OPTIMIZATION : ANALYSIS OF CONVERGENCE

• SCALABILITY : GRADIENT QUANTIZATION

• MULTI-DIMENSIONALITY : TENSOR ALGEBRA

ALGORITHMS

Page 24: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

DISTRIBUTED TRAINING INVOLVES COMPUTATION & COMMUNICATION

Parameterserver

GPU 1 GPU 2

With 1/2 data With 1/2 data

Page 25: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

DISTRIBUTED TRAINING INVOLVES COMPUTATION & COMMUNICATION

Parameterserver

GPU 1 GPU 2

With 1/2 data With 1/2 data

Compress?Compress?

Compress?

Page 26: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

DISTRIBUTED TRAINING BY MAJORITY VOTE

Parameterserver

GPU 1GPU 2GPU 3

sign(g)

sign(g)

sign(g)

Parameterserver

GPU 1GPU 2GPU 3

sign [sum(sign(g))]

Page 27: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

SIGNSGD PROVIDES “FREE LUNCH"

Throughput gain with only tiny accuracy loss

P3.2x machines on AWS, Resnet50 on imagenet

Page 28: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

SIGNSGD ACROSS DOMAINS AND ARCHITECTURES

Huge throughput gain!

Page 29: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

TAKE-AWAYS FOR SIGN-SGD

• Convergence even under biased gradients and noise.

• Faster than SGD in theory and in practice.

• For distributed training, similar variance reduction as SGD.

• In practice, similar accuracy but with far less communication.

Page 30: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

TENSORS FOR LEARNING IN MANY DIMENSIONSTensors: Beyond 2D world

Modern data is inherently multi-dimensional

Page 31: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

Images: 3 dimensions Videos: 4 dimensions

TENSORS FOR MULTI-DIMENSIONAL DATA AND HIGHER ORDER MOMENTS

Pairwise correlations Triplet correlations

Page 32: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

OPERATIONS ON TENSORS: TENSOR CONTRACTIONTensor Contraction

Extends the notion of matrix product

Matrix product

Mv =!

j

vjMj

= +

Tensor ContractionT (u, v, ·) =

!

i,j

uivjTi,j,:

=

++

+

Page 33: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

DEEP NEURAL NETS: TRANSFORMING TENSORS

Page 34: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

DEEP TENSORIZED NETWORKS

Page 35: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

SPACE SAVING IN DEEP TENSORIZED NETWORKS

Page 36: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

Tensor Train RNN and LSTMs

TENSORS FOR LONG-TERM FORECASTING

Challenges:• Long-term

dependencies• High-order

correlations• Error propagation

Page 37: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

Climate datasetTraffic dataset

TENSOR LSTM FOR LONG-TERM FORECASTING

Page 38: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

TENSORLY: H IGH-LEVEL API FOR TENSOR ALGEBRA

• Python programming

• User-friendly API

• Multiple backends: flexible + scalable

• Example notebooks in repository

Page 39: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

A New Vision for Autonomy

Center for Autonomous Systems and Technologies

Page 40: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

CAST: BRINGING ROBOTICS AND AI TOGETHER

Page 41: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

FIRST SET OF RESULTS: LEARNING TO LAND

Page 42: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection
Page 43: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

SOME RESEARCH LEADERS AT NVIDIA

Robotics

Dieter Fox

Learning &Perception

Jan KautzBill Dally Dave Luebke Alex Keller Aaron Lefohn

Graphics

Steve Keckler Dave Nellans Mike O’Connor

ArchitectureProgramming

Michael Garland

VLSI

Brucek Khailany

Circuits

Tom Gray

Networks

Larry Dennison

Chief Scientist

Computervision Core ML

Sanja Fidler Me !

Applied research

Bryan Catanzaro

Page 44: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

• DATA

• Collection: Active learning and partial feedback

• Aggregation: Crowdsourcing models

• Augmentation: Graphics rendering + GANs, Symbolic expressions

• ALGORITHMS

• Convergence: SignSGD has good rates in theory and practice

• Scalability: SignSGD has same variance reduction as SGD for multi-machine

• Multi-dimensionality: Tensor algebra for neural networks and probabilistic models.

• INFRASTRUCTURE:

• Frameworks: Tensorly is high-level API for deep tensorized networks.

CONCLUSION

AI needs integration of data, algorithms and infrastructure

Page 45: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

COLLABORATORS (L IMITED L IST )

Page 46: ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTEtensorlab.cms.caltech.edu › users › anima › slides › NYU-oct2018.pdf · chosen by an active learning algorithm Detection

Thank you