Machine Learning: Overview & Applications to Test · Machine Learning: Overview & Applications to Test 1st Lt Takayuki Iguchi 1st Lt Megan E. Lewis AFOTEC/Det 5/DTS Release Date:

1

Approved for Public Release: Distribution Unlimited. AFOTEC Public Affairs Public Release Number 2017-01

Machine Learning: Overview

& Applications to Test

1st Lt Takayuki Iguchi

1st Lt Megan E. Lewis

AFOTEC/Det 5/DTSRelease Date: 6 MAR 17

2

Why use Machine Learning in test?

• It takes more time to analyze large high dimensional

data than it does to collect it

− Video

− Audio

− BUS data

•Machine learning is designed to work with large high

dimensional data

3

Visualizing Large High Dimensional Data

4


5


6


7


8


9

What is Machine Learning?

• A computer program is said to learn from experience 𝑬with respect to some class of tasks 𝑻 and performance

measure 𝑷, if its performance at tasks in 𝑻, as measured

by 𝑷, improves with experience 𝑬.

• “The field of study that gives computers the ability to

learn without being explicitly programmed.”

10

Types of Machine Learning

• Reinforcement Learning

− Learn to select action that maximize the accumulated

reward over time.

• Unsupervised Learning

− Infer a function from unlabeled training data

• Supervised Learning

− Infer a function from labeled training data

11

Types of ML

• Unsupervised Learning

“These things are

similar”

“These things will add up to

something that will look like a 2.”

(van der Maaten [2008])

(Hinton [2013])

12

“This is the correct

salary of a professor

given the time since

highest degree earned.”

Types of Machine Learning

• Supervised Learning

“These are a camels.

Those are a people.”

Include graph of a

simple linear

regression

(Weisberg [1985]) (ImageNet [2014])

13

Unsupervised Learning Tasks

• Anomaly detection / outlier detection

• Dimensionality Reduction

•Manifold Learning

• Clustering

14

Anomaly Detection

• As instrumentation has improved, often the limiting

factor isn’t that there isn’t enough data but that there is

too much data

•With flight test there is often a small time window

between sorties. Thus quick cursory data analysis is

desired but not currently possible

• Anomaly Detection methods can help identify otherwise

hidden issues (not detected by aircrew) before they

manifest into larger issues

15

Anomaly Detection

• Perform problem ID with logged & uncontrollable factors

− Variety of different algorithms and methodologies

− Choice of algorithm & methodology is dependent on

application and nature of data

16

Dimensionality Reduction

• The goal: Take a high dimensional dataset and find a

“good” representation in a lower dimension (e.g., 2-D).

• Signal decomposition

methods:

− Principal Component

Analysis (PCA)

− Kernel PCA

− Factor Analysis

− Non-negative matrix

factorization

• Manifold learning:

– Isomap

– Locally linear embedding (LLE)

– Spectral embedding

– Multi-dimensional scaling (MDS)

– 𝒕-stochastic Neighbor

Embedding (𝒕-SNE)

17

𝒕-stochastic Neighbor Embedding

• PCA is variance based.

• If the structure in the high dimensional space lies on a

non-linear manifold, PCA will not work well.

(Vanderplas, scikit-learn [2016])

18



19



20



21



22

𝒕-stochastic neighbor embedding

0

1

2

3

4

5

6

7

8

9

t-SNE

Isomap

Sammon mapping

Locally Linear Embedding


23

Clustering

• The goal: Partition a dataset to maximize similarity

within each partition.

• Connectivity-based /

hierarchical clustering

– Single Linkage

Clustering (SLINK)

• Centroid based

clustering

– 𝒌-means++

– 𝒌-medians

• Density based clustering

– Density-based spatial

clustering of

applications with noise

(DBSCAN)

• Distribution based

clustering

– Gaussian Mixture

Models

24

𝒌-means

• Randomly draw

cluster centroids

• Until clustering

remains unchanged

− Assign points to

nearest centroid

− Calculate new

centroids

•Output Clustering

25

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


26

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


27

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


28

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


29

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


30

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


31

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


32

𝒌-means

• Randomly draw

cluster centroids


remains unchanged


nearest centroid

− Calculate new

centroids


33

Supervised Learning Tasks

• Classification

− Output is discrete

Speech Recognition

Image Classification

• Regression

− Output is continuous

34

Neural Networks

• A Neuron

•Mathematical

model for a

neuron

𝑔

𝑎𝑗Σ𝑎𝑖𝑤𝑖𝑗

𝑤0𝑗

𝑎0 = 1

(Russell, Norvig [2010])

35

Perceptrons

𝑥2

𝑥3

𝑥1

𝑦

𝑤1

𝑤2

𝑤3

• All inputs connected to outputs

• Error function:

− 𝑬 =𝟏

𝟐𝒕 − 𝒚 𝟐

• Update weights with each training case:

− Output unit is a threshold unit

𝚫𝒘𝒊 = 𝝐 𝒚 − 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅(𝒘⊤𝒙 ) ∗ 𝒙𝒊

− Output unit is a logistic unit

𝚫𝒘𝒊 = 𝝐𝝏𝑬

𝝏𝒘𝒊= 𝝐𝝈 𝒘⊤𝒙 (𝒚 − 𝝈 𝒘⊤𝒙 )(𝟏 − 𝝈 𝒘⊤𝒙 )𝒙𝒊

(Russell, Norvig [2010])

36

Multi-layer Perceptron

• Better performing than a single-layer feed forward

neural network

• Trained with backpropagation

𝑥2

𝑥3

𝑥1

𝑤1

𝑤4

𝑤2

𝑤6

𝑤5

𝑤3

𝑦

𝑤7

𝑤8

ℎ1

ℎ2

37

Image Text Recognition

•Over-the-shoulder videos are

common data sources

− Cheap to implement

− Processing is time intensive

• ANNs can help

(Karpathy [2015]) (Shi, et. al. [2016])

38

Convolutional Neural Network

• Typically used for image classification

• RGB Images can be thought of as a 3d matrix

• Fully connected hidden layers too many weights

• The forward pass: pass a filter over the image

(Hinton [2013])

39


(Karpathy [2016])

40


(Karpathy [2016])

41


(Karpathy [2016])

42


(Karpathy [2016])

43


(Karpathy [2016])

44


(Karpathy [2016])

45


(Karpathy [2016])

46


(Karpathy [2016])

47


(Karpathy [2016])

48

Hyperspectral Classification

Data from https://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html

Per-pixel classification from hyperspectral data

49

CNNs on MNIST

•Misclassifications of

LeNet5

(LeCun [1998])

50

Recurrent Neural Networks

• Directed cycles in their connection graph

•MLPs and CNNs require fixed sized input

• Used to model sequential data

• Hard to Train

Input

Layer

Hidden

Layer

Output

Layer

𝑡1 𝑡2 𝑡3 𝑡4 𝑡5 𝑡6

51

Recurrent Neural Networks

(Karpathy [2015])

52

Image Text Recognition

• Different ways of thinking

about the problem

• Long Short-Term Memory

Layer most recently used

(Karpathy [2015]) (Shi, et. al. [2016])

53

Audio-speech Recognition

• Traditional Speech Models

Speech

waveform

Acoustic

Model

argmax𝑋

𝑃 𝑊 𝑋 = argmax𝑊,𝐿

𝑃 𝑋 𝐿 𝑃 𝐿 𝑊 𝑃(𝑊)

Pronunciation

Model

Language

ModelSentence

(kaˈfā) (cafe)

Phenomes Words

(Beaufays [2016])

54

Other Acoustic Models

•Other DNN based approaches to acoustic modeling

(Beaufays [2016])

Method Year

DBN 2012

Long Short Term Memory

(LSTM)

2013

Convolutional LSTM DNN 2014

Connectionist Temporal

Classification (CTC)

2015

55

Summary of Applications

• Audio to text

− Transcribe in-flight audio/ conversations

− Transcribe survey conversations

− Easily slew to audio of interest

• Image captioning

− Write text in image to a text file

− In-flight data

•Object recognition in images

− Help label truth data when testing sensors

• Video is just adding a time dimension to images

− Techniques from images may be applied to video

56

Next Steps

• Low hanging fruit

− Use already existing open source text recognition in

images/video

OpenCV

− Use free audio transcription software

TensorFlow (Google)

SwiftScribe (Baidu)

57

Next Steps

•Open areas for development:

− Transcribing acronyms

− Using Machine Learning on bus data to tell a maintainer of

a certain risk.

− ATC radar more accurately narrowing down location of a/c

in real time (Hrastovec et. al. [2014])

− Identifying early indications of airframe stress and strain

(Hickinbotham et. al. [2000])

58

Acknowledgements

•Workshop organizers

• AFOTEC Det 5 leadership

•Mr. Jeff Wilson

• Capt Joshua Vaughan

59

References

• Hinton, Geoffrey. Artificial Neural Networks. Coursera. (2013)

• ImageNet (2014). http://www.image-net.org/challenges/LSVRC/2014/ui/det.html

• Karpathy, Andrej et. al. CS321n online course notes: http://cs231n.stanford.edu/

• Karpathy, Andrej RNN github page (2015): http://karpathy.github.io/2015/05/21/rnn-

effectiveness/

• LeCun, Yann “Gradient-Based Learning Applied to document Recognition”

• MATLAB documentation (2017): https://www.mathworks.com/discovery/support-

vector-machine.html

• van der Maaten 𝒕-SNE github page (2016): https://lvdmaaten.github.io/tsne/

• van der Maaten, Hinton. “Visualizing Data using 𝒕-SNE” JMLR 2008

• Russell, Norvig. Artificial Intelligence: A Modern Approach. 3rd Ed. 2010. New Jersey:

Pearson.

• scikit-learn documentation (2016). http://scikit-learn.org/stable/documentation.html

• S. Weisberg (1985). Applied Linear Regression, Second Edition. New York: John

Wiley and Sons.

• Wolberg, W.H., & Mangasarian, O.L. (1990). Multisurface method of pattern

separation for medical diagnosis applied to breast cytology. In Proceedings of the

National Academy of Sciences, 87, 9193--9196.

http://www.image-net.org/challenges/LSVRC/2014/ui/det.html

http://cs231n.stanford.edu/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

https://www.mathworks.com/discovery/support-vector-machine.html

https://lvdmaaten.github.io/tsne/

http://scikit-learn.org/stable/documentation.html

60

References

• van der Maaten, Hinton. “Visualizing Data using 𝒕-SNE” JMLR (2008).

• Shi, Baoguang, Xiang Bai, and Cong Yao. "An end-to-end trainable neural network

for image-based sequence recognition and its application to scene text recognition."

IEEE Transactions on Pattern Analysis and Machine Intelligence (2016).

• Ghamisi, Pedram, et al. "Advanced Supervised Spectral Classifiers for Hyperspectral

Images: A Review." IEEE Geoscience and Remote Sensing Magazine (GRSM) (2017).

• Dahl, George E., Dong, Deng, Acero. “ Context-Dependent Pre-Trained Deep Neural

Networks for Large-Vocabulary Speech Recognition.” IEEE Transactions on Audio,

Speech, and Language Processing (2012).

• Gupta, Manish, et. al. “Outlier Detection for Temporal Data: A Survey.” IEE

Transactions on Knowledge and Data Engineering (2014).

• Beaufays, Francoise. “Speech Recognition” Google I/O (2016).

• Yoon, Seunghyun, et al. "Efficient Transfer Learning Schemes for Personalized

Language Modeling using Recurrent Neural Network." arXiv preprint

arXiv:1701.03578 (2017).

61

Questions?

Machine Learning: Overview & Applications to Test · Machine Learning: Overview & Applications to Test 1st Lt Takayuki Iguchi 1st Lt Megan E. Lewis AFOTEC/Det 5/DTS Release Date:

Documents