Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things Robert Dick [email protected]Department of Electrical Engineering and Computer Science University of Michigan 34,000× Internet-of-Things Machine learning Research directions Deadlines and announcements Outline 1. Internet-of-Things 2. Machine learning 3. Research directions 4. Deadlines and announcements 2 R. Dick EECS 598-13 Internet-of-Things Machine learning Research directions Deadlines and announcements Applications Smart city: $350B (Markets and Markets). Smart homes: $31B (Statista). Wearables: $30B (Markets and Markets). Connected vehicles: $60B (Sheer Analytics and Insights). Networked manufacturing. Networked agriculture. Networked medical care. Smart retail and supply chain. Environmental management. 3 R. Dick EECS 598-13 Internet-of-Things Machine learning Research directions Deadlines and announcements Wireless communication standards Technology Power (mW) Range (m) Typical rate (kb/s) 4G 1,000 70,000 10,000 5G 1,000 40,000 100,000 WiFi / 802.11(g) 250 140 20,000 Zigbee / 802.15.4 1–100 10–1,500 20–200 LoRaWAN 10 15,000 20 NB-IoT 100 15,000 250 4 R. Dick EECS 598-13
21
Embed
Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction to Embedded Systems Research:Machine Learning in the Internet-of-Things
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Structural relationship with quality
What works better?
More or fewer?
Denser or less dense?
Aligned or unaligned?
High-power or low-power?
E. Genc, C. Fraenz, C. Schluter, P. Friedrich, R. Hossiep, M. C. Voelkle,J. M. Ling, O. Gunturkun, and R. E. Jung, “Diffusion markers of dendriticdensity and arborization in gray matter predict differences in intelligence,”Nature Communications, vol. 9, 2018.
42 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Optimization objectives?
2% mass, 20% power.
Probably
power-constrained,
estimated quality constrained, and
energy-minimized.
43 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Receptive field
Inte
nsity
Inte
nsity
S/N = 10
S/N = 2
0
0
0
Inte
nsity
S/N = 0.1
Position
J. J. Atick and A. N. Redlich, “Toward a theory of early visual processing,”Neural Computation, vol. 2, pp. 308–320, 1990.
44 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Fish adaptation
J. E. Niven and S. B. Laughlin, “Energy limitation as a selective pressure onthe evolution of sensory systems,” J. Experimental Biology, vol. 211, pp.1792–1804, Apr. 2008
45 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Insect signal processing
Roughly 5× total power consumption variation within a species.
Why can’t they gate?
46 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Section outline
2. Machine learningClassification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
47 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Linear regression
Set parameters of hyperplane to minimize cost function relative to data.
Minimizing (root) mean squared error is common.
Fails for non-linear functions.
Non-linear also possible, but this generally complicates optimization.
Neural networks are just an efficient way to approximate functions.
48 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Neural networks with linear activation functions
Can well approximate linear functions.
Number of layers does not change possible set of functions.
49 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Neural networks with non-linear activation functions
Can approximate non-linear functions.
Increasing number of layers improves function encoding efficiency.
However, undermines training.
See S. Li, J. Jiao, Y. Han, and T. Weissman, “Demystifying ResNet,” arXiv1611.01186, 2016.
50 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Neural networks
51 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Neural networks
52 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
What does it do?
Computes non-linear functions of inputs via forward propagation.
Two or more layers sufficient for any Boolean function, but hard to train.
Under some circumstances, deeper networks easier to train.
How to train?
53 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Math for one synthetic neuron
y = f
(n∑
k=1
ikwk
)
n: number of inputs
i : inputs
w : input weights
f (): activation function, commonly tanh()
y : output
54 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Training: gradient descent via randomized weight changes
Change a weight.
Check whether the results improved on training set.
If so, keep the change.
Too slow for most applications.
Why?
55 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Training: back-propagation
Forward propagate training sample.
Determine error at outputs relative to correct output.
Propagate backward to compute node-level errors.
For each weight
Multiply output error by input to find gradient.
Subtract α times the gradient from the weight.
0 ≤ α ≤ 1.
This updates many weights every training event.
56 R. Dick EECS 598-13
Back-propagation fundamentals
To enable rapid convergence for high-dimension networks, use gradientdescent in weights.
Intuition: efficiently computing the change in all neuron’s outputs w.r.t.changes in weights, potentially in prior layers.
Chains pose a problem: exponential number of paths through network.
Use chain rule to express each gradient as a function of those in subsequentlayer only.
But there is more to it than this. . .
Linear in depth: every layer depends only on subsequent layer.
Quadratic in width for fully-connected layers.
Study Lagrange multipliers and see Y. le Cun, “A theoretical framework forback-propagation,” in Proc. Connectionist Models Summer School, 1988,pp. 21–28.
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Chain rule
F (x) = f (g(x))
F ′(x) = f ′(g(x))g ′(x)
58 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Back-propagation equations I
δL = OaC � σ′(zL)
δl =((
w l+1)Tδl+1
)� σ′
(z l)
∂C
∂blj= δlj
∂C
∂w ljk
= al−1k δlj
δL is the vector of errors associated with the output layer, L.
OaC is the vector of the partial derivatives of the cost function w.r.t. theoutput activations.
� is the Hadamard product.
59 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Back-propagation equations II
σ′(zLj)
is the rate of change of the activation function σ at zLj .
δl is the vector of errors associated with layer l .
w l+1 is the weight matrix associated with layer l .
blj is bias vector for layer l and neuron j .
w ljk is the row j , column k weight for layer l .
alj is the activation for the jth neuron in layer l .
Following M. Nielsen’s notation.
60 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Deep learning
No universally agreed upon definition.
Using many-layer non-linear transformations to extract information ofincreasingly higher levels of abstraction from feature vectors or raw data.
Or learning parameters of a statistical model using multiple hidden layersbetween input and output.
61 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
What changed?
Back-propagation was important (’70s and ’80s for application to NN).
Most fundamentals from ’80s and ’90s.
Computers became fast enough for naıve implementations to work.
Sufficient data became available in some areas to enable training.
62 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Examples
https://playground.tensorflow.org
63 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Incomplete connectivity
64 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Convolutional neural networks
First layers apply learned convolution kernels to input.
Heavily used for vision applications.
65 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Convolution
K =
k0
0 k10 k2
0
k01 k1
1 k21
k02 k1
2 k22
∀i≤w−2i=0+1 ∀
j≤w−2j=0+1 g i−1
j−1 =
m≤2,n≤2∑
m=0,n=0
kmn · pi+m
j+n
66 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Convolution example
67 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Vertical edge detection
K =
−1 0 1−1 0 1−1 0 1
68 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Blurring
K =
1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1
69 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Blurring
K =
1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1
69 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
CNN
70 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
First-layer convolutional kernels
71 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Feature extraction
Old way of thinking
Feature extraction is distinct from the rest of the machine learningprocess.
Humans carefully determine ideal features.
Statistical learning techniques determine how to partition space basedon training examples.
New way of thinking
Each stage of machine learning increases the level of abstraction.
Going from pixels to an edge.
Going from edges to a parallelogram.
Can automatically learn features from raw data, at a computationalcost.
72 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation
Temporal sequences: unrolling
Select n discrete time steps into the past.
For each, provide additional inputs.
Implications: (often very) superlinear increase in network size.
75 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Outline
1. Internet-of-Things
2. Machine learning
3. Research directions
4. Deadlines and announcements
76 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Efficient algorithms
Omit computation on useless input data.
Eliminate redundancy within analysis networks
Prune near zero weight edges after training.
Consider only local relationships at some network depths.
More efficient training methods, e.g., residual networks.
Reduce edge weight precision.
77 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Biomimetic signal processing
Pattern electronic systems using design idioms from energy-efficientbiological systems.
Mimic structure of application-specific signal processing pipelines.
78 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Application-specific hardware
Mirror network structure with hardware.
Enable parallel operations.
Improve communication latency and throughput among algorithmicallyadjacent computational elements.
Specialized devices that efficiently mimic neural properties.
Note that even the brain uses digital encoding for long-range transmission.
79 R. Dick EECS 598-13
Internet-of-ThingsMachine learning
Research directionsDeadlines and announcements
Partitioned networks
Partial of analysis on edge devices, part in cloud.
Designing algorithms with low-cost cut points is challenging.
Could adapt to changing computation/communication costs.