YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Introduction to Embedded Systems Research:Machine Learning in the Internet-of-Things

Robert Dick

[email protected] of Electrical Engineering and Computer Science

University of Michigan

34,000×

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Outline

1. Internet-of-Things

2. Machine learning

3. Research directions

4. Deadlines and announcements

2 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Applications

Smart city: $350B (Markets and Markets).

Smart homes: $31B (Statista).

Wearables: $30B (Markets and Markets).

Connected vehicles: $60B (Sheer Analytics and Insights).

Networked manufacturing.

Networked agriculture.

Networked medical care.

Smart retail and supply chain.

Environmental management.

3 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Wireless communication standards

Technology Power (mW) Range (m) Typical rate (kb/s)4G 1,000 70,000 10,0005G 1,000 40,000 100,000WiFi / 802.11(g) 250 140 20,000Zigbee / 802.15.4 1–100 10–1,500 20–200LoRaWAN 10 15,000 20NB-IoT 100 15,000 250

4 R. Dick EECS 598-13

Page 2: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Energy efficiency I

ARM Cortex A57

Mid-range IoT application processor.

7.1 W at 1.9 GHz.

64-bit processor.

4 MIPS/MHz → 7.6 GIPS.

Average instruction duration: 130 ps.

930 pJ/word.

15 pJ/bit.

5 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Energy efficiency II

LoRaWAN

10 mW.

20 kb/s.

50 µJ/b.

34,000 bit computations per bit transmission.

MICAz was 625×.

6 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Reliability and security

All the problems we learned about for other embedded systems, plus. . .

Large attack surface: sensors, algorithms, networks, and actuators.

No single company knows entire system design: formal methods impossible.

Large-scale system composed of heterogeneous components.

Fault processes are interdependent due to indirect coupling (environmentaland social.

Identifying catastrophic system failure modes akin predicting financialsystems, not isolated embedded systems.

Manually characterizing indirect relationships among IoT component faultprocesses is impractical.

7 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Computational platforms

Low-power GPUs likely to become available and common.

Semi-custom accelerators for limited-width parallel MAC operations.

Analog weight state memories.

8 R. Dick EECS 598-13

Page 3: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Outline

1. Internet-of-Things

2. Machine learning

3. Research directions

4. Deadlines and announcements

9 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Section outline

2. Machine learningClassification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

10 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Classification

Determining which class a new observation falls into.

11 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Easy example

Student: 0.72, Protester: 0.14, Cat: 0.08, ...

12 R. Dick EECS 598-13

Page 4: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Easy example

Cat: 0.81, Student: 0.03, ...

13 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Moderate complexity example

Cat: 0.61, Student: 0.34, Protester: 0.14, ...

14 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Impossible example

Cat: 0.48, Student: 0.45, ...

15 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Manual algorithm design: feature extraction

Bounding box using eye detection.

Hairiness feature using edge detection: scalar.

Image segmentation based on color.

Pose classification based on segment shapes: vector.

Image region color histograms: vector.

16 R. Dick EECS 598-13

Page 5: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Bounding box using eye detection

17 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Bounding box using eye detection

18 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Manual algorithm design: feature extraction

Bounding box using eye detection.

Hairiness feature using edge detection: scalar.

Image segmentation based on color.

Pose classification based on segment shapes: vector.

Image region color histograms: vector.

19 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Hairiness feature

20 R. Dick EECS 598-13

Page 6: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Manual algorithm design: feature extraction

Bounding box using eye detection.

Hairiness feature using edge detection: scalar.

Image segmentation based on color.

Pose classification based on segment shapes: vector.

Image region color histograms: vector.

21 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Color histogram

22 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Principal component analysis

Avoid doing this in a production system.

Valuable during learning process.

Transforms a data set from an input space to an output space in whichdimensions are orthogonal and are ordered by decreasing variance.

Too many dimensions to plot.

Truncation can be useful for data visualization.

23 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Two dimensions of original data(there are many more)

24 R. Dick EECS 598-13

Page 7: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

First two principal components

25 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Cluster overlap in feature space

26 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Section outline

2. Machine learningClassification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

27 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Machine learning for classification example

Given a feature vector, or raw data, for a new sample, determine the class.

Training

Given: set of training samples labeled by class.

Learn statistical model structure and parameters to correctly classifythese points.

Avoid overtraining: Modeling the training data set so precisely that itfails on other data.

Use

Given: set of unlabeled samples.

Use model to determine classes of samples.

28 R. Dick EECS 598-13

Page 8: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Example algorithm: K-means

Goal

argminS

k∑

i=1

|Si | var Si ,

where k is the number of means, S is the set of sets Si , and var is thevariance of Si .

Simple.

Works on many real-world problems.

Position k centroids such that the within-cluster sum of square distances isminimal.

Partitions feature space into Voronoi cells.

29 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

K-means implementation

NP-hard.

Heuristics typically used.

E.g., iteratively add samples to nearest (Euclidean distance) centroid andupdate centroid.

30 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

K Means

31 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Voronoi cells

Credit: National Instruments.

32 R. Dick EECS 598-13

Page 9: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

K-means requirements and limitations

Must have naturally separated clusters.

Cell thresholds based on global (not local, learned) distance function.

Concave shapes cannot be trivially modeled, but can be approximated with khigher than the cluster count.

Must know k, although it can be learned using cost(k) shape.

33 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

K-means error

34 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Decision trees

Iteratively split on feature maximizing entropy reduction.

argminf∈F

c∈Cf

−p(c) log2 p(c),

where F is the set of features and Cf is the set of choices for feature f .

These are heavily prone to overfitting.

35 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Random forests

Train trees with random, overlapping subsets of training data.

Evaluate all trees.

Average or majority vote using the results.

36 R. Dick EECS 598-13

Page 10: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Section outline

2. Machine learningClassification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

37 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Neuron

38 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Structure

Many-layer redundant network.

Dendrite to axon flow is typical.

Cell body has a diameter of 4–100 µm.

Axons are generally 10,000 times as long as the cell body diameter.

Axons move away from low-activity synapses to form other synapses.

39 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Signalling

Digital and analog exist.

Digital used for longer distance propagation.

Probably to resist noise.

Analog used for shorter distance propagation.

Digital generate discrete pulses by dumping ions into synapse.

Most power consumed running ion pump to fight leakage.

Frequency of firing indicates intensity.

Fires when threshold reached.

Near-simultaneous stimuli move farther toward threshold.

40 R. Dick EECS 598-13

Page 11: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Signalling

Integrate input delta functions, considering arrival times.

Fire when threshold reached.

Idle for refractory period.

41 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Structural relationship with quality

What works better?

More or fewer?

Denser or less dense?

Aligned or unaligned?

High-power or low-power?

E. Genc, C. Fraenz, C. Schluter, P. Friedrich, R. Hossiep, M. C. Voelkle,J. M. Ling, O. Gunturkun, and R. E. Jung, “Diffusion markers of dendriticdensity and arborization in gray matter predict differences in intelligence,”Nature Communications, vol. 9, 2018.

42 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Optimization objectives?

2% mass, 20% power.

Probably

power-constrained,

estimated quality constrained, and

energy-minimized.

43 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Receptive field

Inte

nsity

Inte

nsity

S/N = 10

S/N = 2

0

0

0

Inte

nsity

S/N = 0.1

Position

J. J. Atick and A. N. Redlich, “Toward a theory of early visual processing,”Neural Computation, vol. 2, pp. 308–320, 1990.

44 R. Dick EECS 598-13

Page 12: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Fish adaptation

J. E. Niven and S. B. Laughlin, “Energy limitation as a selective pressure onthe evolution of sensory systems,” J. Experimental Biology, vol. 211, pp.1792–1804, Apr. 2008

45 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Insect signal processing

Roughly 5× total power consumption variation within a species.

Why can’t they gate?

46 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Section outline

2. Machine learningClassification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

47 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Linear regression

Set parameters of hyperplane to minimize cost function relative to data.

Minimizing (root) mean squared error is common.

Fails for non-linear functions.

Non-linear also possible, but this generally complicates optimization.

Neural networks are just an efficient way to approximate functions.

48 R. Dick EECS 598-13

Page 13: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Neural networks with linear activation functions

Can well approximate linear functions.

Number of layers does not change possible set of functions.

49 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Neural networks with non-linear activation functions

Can approximate non-linear functions.

Increasing number of layers improves function encoding efficiency.

However, undermines training.

See S. Li, J. Jiao, Y. Han, and T. Weissman, “Demystifying ResNet,” arXiv1611.01186, 2016.

50 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Neural networks

51 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Neural networks

52 R. Dick EECS 598-13

Page 14: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

What does it do?

Computes non-linear functions of inputs via forward propagation.

Two or more layers sufficient for any Boolean function, but hard to train.

Under some circumstances, deeper networks easier to train.

How to train?

53 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Math for one synthetic neuron

y = f

(n∑

k=1

ikwk

)

n: number of inputs

i : inputs

w : input weights

f (): activation function, commonly tanh()

y : output

54 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Training: gradient descent via randomized weight changes

Change a weight.

Check whether the results improved on training set.

If so, keep the change.

Too slow for most applications.

Why?

55 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Training: back-propagation

Forward propagate training sample.

Determine error at outputs relative to correct output.

Propagate backward to compute node-level errors.

For each weight

Multiply output error by input to find gradient.

Subtract α times the gradient from the weight.

0 ≤ α ≤ 1.

This updates many weights every training event.

56 R. Dick EECS 598-13

Page 15: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Back-propagation fundamentals

To enable rapid convergence for high-dimension networks, use gradientdescent in weights.

Intuition: efficiently computing the change in all neuron’s outputs w.r.t.changes in weights, potentially in prior layers.

Chains pose a problem: exponential number of paths through network.

Use chain rule to express each gradient as a function of those in subsequentlayer only.

But there is more to it than this. . .

Linear in depth: every layer depends only on subsequent layer.

Quadratic in width for fully-connected layers.

Study Lagrange multipliers and see Y. le Cun, “A theoretical framework forback-propagation,” in Proc. Connectionist Models Summer School, 1988,pp. 21–28.

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Chain rule

F (x) = f (g(x))

F ′(x) = f ′(g(x))g ′(x)

58 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Back-propagation equations I

δL = OaC � σ′(zL)

δl =((

w l+1)Tδl+1

)� σ′

(z l)

∂C

∂blj= δlj

∂C

∂w ljk

= al−1k δlj

δL is the vector of errors associated with the output layer, L.

OaC is the vector of the partial derivatives of the cost function w.r.t. theoutput activations.

� is the Hadamard product.

59 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Back-propagation equations II

σ′(zLj)

is the rate of change of the activation function σ at zLj .

δl is the vector of errors associated with layer l .

w l+1 is the weight matrix associated with layer l .

blj is bias vector for layer l and neuron j .

w ljk is the row j , column k weight for layer l .

alj is the activation for the jth neuron in layer l .

Following M. Nielsen’s notation.

60 R. Dick EECS 598-13

Page 16: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Deep learning

No universally agreed upon definition.

Using many-layer non-linear transformations to extract information ofincreasingly higher levels of abstraction from feature vectors or raw data.

Or learning parameters of a statistical model using multiple hidden layersbetween input and output.

61 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

What changed?

Back-propagation was important (’70s and ’80s for application to NN).

Most fundamentals from ’80s and ’90s.

Computers became fast enough for naıve implementations to work.

Sufficient data became available in some areas to enable training.

62 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Examples

https://playground.tensorflow.org

63 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Incomplete connectivity

64 R. Dick EECS 598-13

Page 17: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Convolutional neural networks

First layers apply learned convolution kernels to input.

Heavily used for vision applications.

65 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Convolution

K =

k0

0 k10 k2

0

k01 k1

1 k21

k02 k1

2 k22

∀i≤w−2i=0+1 ∀

j≤w−2j=0+1 g i−1

j−1 =

m≤2,n≤2∑

m=0,n=0

kmn · pi+m

j+n

66 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Convolution example

67 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Vertical edge detection

K =

−1 0 1−1 0 1−1 0 1

68 R. Dick EECS 598-13

Page 18: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Blurring

K =

1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1

69 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Blurring

K =

1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1

69 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

CNN

70 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

First-layer convolutional kernels

71 R. Dick EECS 598-13

Page 19: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Feature extraction

Old way of thinking

Feature extraction is distinct from the rest of the machine learningprocess.

Humans carefully determine ideal features.

Statistical learning techniques determine how to partition space basedon training examples.

New way of thinking

Each stage of machine learning increases the level of abstraction.

Going from pixels to an edge.

Going from edges to a parallelogram.

Can automatically learn features from raw data, at a computationalcost.

72 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Classification and contextMachine learningNeural networks perspective 1: biomemitic computationNeural networks perspective 2: function approximation

Temporal sequences: unrolling

Select n discrete time steps into the past.

For each, provide additional inputs.

Implications: (often very) superlinear increase in network size.

75 R. Dick EECS 598-13

Page 20: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Outline

1. Internet-of-Things

2. Machine learning

3. Research directions

4. Deadlines and announcements

76 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Efficient algorithms

Omit computation on useless input data.

Eliminate redundancy within analysis networks

Prune near zero weight edges after training.

Consider only local relationships at some network depths.

More efficient training methods, e.g., residual networks.

Reduce edge weight precision.

77 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Biomimetic signal processing

Pattern electronic systems using design idioms from energy-efficientbiological systems.

Mimic structure of application-specific signal processing pipelines.

78 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Application-specific hardware

Mirror network structure with hardware.

Enable parallel operations.

Improve communication latency and throughput among algorithmicallyadjacent computational elements.

Specialized devices that efficiently mimic neural properties.

Note that even the brain uses digital encoding for long-range transmission.

79 R. Dick EECS 598-13

Page 21: Internet-of-Things Introduction to Embedded Systems ...ziyang.eecs.umich.edu/iesr/lectures/l14-2x2.pdf · Introduction to Embedded Systems Research: Machine Learning in the Internet-of-Things

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Partitioned networks

Partial of analysis on edge devices, part in cloud.

Designing algorithms with low-cost cut points is challenging.

Could adapt to changing computation/communication costs.

80 R. Dick EECS 598-13

Internet-of-ThingsMachine learning

Research directionsDeadlines and announcements

Outline

1. Internet-of-Things

2. Machine learning

3. Research directions

4. Deadlines and announcements

81 R. Dick EECS 598-13


Related Documents