LINEAR CLASSIFICATION
Biological inspirations
Some numbers… The human brain contains about 10 billion
nerve cells (neurons) Each neuron is connected to the others through
10000 synapses
Properties of the brain It can learn, reorganize itself from experience It adapts to the environment It is robust and fault tolerant
Biological neuron (simplified model)
A neuron has A branching input (dendrites) A branching output (the axon)
The information circulates from the dendrites to the axon via the cell body
The cell body sums up the inputs in some way and fires – generates a signal through the axon – if the result is greater than some threshold
An Artificial Neuron
- weights - inputs
Definition : Non linear, parameterized function with restricted output range
Activation Function
Usually not pictured (we’ll see why), but you can imagine a threshold parameter here.
Same Idea using the Notation in the Book
The Output of a Neuron
As described so far…
This simplest form of a neuron is also called a perceptron.
The Output of a Neuron
Other possibilities, such as the sigmoid function for continuous output.
𝟏
𝟏+𝒆−𝒊𝒏 𝒋
𝒑• is the activation of the neuron • is a parameter which controls the
shape of the curve (usually )
Linear Regression using a Perceptron
Linear regression:
Find a linear function (straight line) that best predicts the continuous-valued output.
Linear Regression As an Optimization Problem
Finding the optimal weights could be solved through: Gradient descent Simulated annealing Genetic algorithms … and now Neural Networks
Linear Regression using a Perceptron
𝑓 (𝑥 )=𝑤1𝑥+𝑤0
𝑥 1𝑤1 𝑤0
𝑤1𝑥+1𝑤0
𝑓 (𝑥)
The Bias Term
So far we have defined the output of a perceptron as controlled by a threshold
x1w1 + x2w2 + x3w3… + xnwn >= t But just like the weights, this threshold is
a parameter that needs to be adjusted
Solution: make it another weightx1w1 + x2w2 + x3w3… + xnwn + (1)(-t)
>= 0The bias term.
A Neuron with a Bias Term
Another Example
Assign weights to perform the logical OR operation.
𝐴 𝐵𝑤2 𝑤1
𝑤2 𝐴+𝑤1𝐵+𝑤0≥0
1
𝑤0
𝑤0=¿
𝑤1=¿
𝑤2=¿
Artificial Neural Network (ANN) A mathematical model to solve
engineering problems Group of highly connected neurons to
realize compositions of non linear functions
Tasks Classification Discrimination Estimation
Feed Forward Neural Networks The information is propagated from the inputs to
the outputs There are no cycles between outputs and inputs
the state of the system is not preserved from one iteration to another
x1 x2 xn…..
1st hidden layer
2nd hiddenlayer
Output layer
ANN Structure
Finite number of inputs Zero or more hidden layers One or more outputs
All nodes at the hidden and output layers contain a bias term.
Examples
Handwriting character recognition
Control of a virtual agent
ALVINNNeural Network controlled AGV (1994)
weights
http://blog.davidsingleton.org/nnrccar
Learning
The procedure that consists in estimating the weight parameters so that the whole network can perform a specific task
The Learning process (supervised) Present the network a number of inputs and their
corresponding outputs See how closely the actual outputs match the desired
ones Modify the parameters to better approximate the desired
outputs
Perceptron Learning Rule
1. Initialize the weights to some random values (or 0)
2. For each sample in the training set1. Calculate the current output of the
perceptron, 2. Update the weights
3. Repeat until the error is smaller than some predefined threshold
is the learning rate, usually
Linear Separability
Perceptrons can classify any input that is linearly separable.
For more complex problems we need a more complex model.
Different Non-Linearly Separable Problems
StructureTypes of
Decision RegionsExclusive-OR
ProblemClasses with
Meshed regionsMost General
Region Shapes
Single-Layer
Two-Layer
Three-Layer
Half PlaneBounded ByHyperplane
Convex OpenOr
Closed Regions
Arbitrary(Complexity
Limited by No.of Nodes)
A
AB
B
A
AB
B
A
AB
B
BA
BA
BA
Calculating the Weights
The weights are a vector of parameters where we need to find a global optimum
Could be solved by: Simulated annealing Gradient descent Genetic algorithms
http://www.youtube.com/watch?v=0Str0Rdkxxo
Perceptron learning rule is pretty much gradient descent.
Learning the Weights in a Neural Network
Perceptron learning rule (gradient descent) worked before, but it required us to know the correct output of the node.
How do we know the correct output of a given hidden node??
Backpropagation Algorithm
Gradient descent over entire network weight vector
Easily generalized to arbitrary directed graphs
Will find a local, not necessarily global error minimum in practice often works well (can be invoked
multiple times with different initial weights)
Backpropagation Algorithm
1. Initialize the weights to some random values (or 0)2. For each sample in the training set
1. Calculate the current output of the node, 2. For each output node , update the weights
3. For each hidden node, update the weights
3. For all network weights do
4. Repeat until weights converge or desired accuracy is achieved
∆ 𝑗=(h𝑥 𝑗)(1−h𝑥 𝑗
)∑𝑘
𝑤 𝑗 ,𝑘∆𝑘
𝑤𝑖 , 𝑗=𝑤𝑖 , 𝑗+𝛼∆ 𝑗 𝑥 𝑗
Intuition
General idea: hidden nodes are “responsible” for some of the error at the output nodes it connects to
The change in the hidden weights is proportional to the strength (magnitude) of the connection between the hidden node and the output node
This is the same as the perceptron learning rule, but for a sigmoid decision function instead of a step decision function (full derivation on p. 726)
𝑤𝑖=𝑤𝑖+𝛼 ( 𝑦 𝑗−h𝑥 𝑗 )(h𝑥 𝑗)(1−h𝑥 𝑗
)𝑥 𝑗
Intuition
General idea: hidden nodes are “responsible” for some of the error at the output nodes it connects to
The change in the hidden weights is proportional to the strength (magnitude) of the connection between the hidden node and the output node
Intuition
When expanded, the update to the output nodes is almost the same as the perceptron rule
Slight difference is that the algorithm uses a sigmoid function instead of a step function (full derivation on p. 726)
𝑤𝑖=𝑤𝑖+𝛼 ( 𝑦 𝑗−h𝑥 𝑗 )(h𝑥 𝑗)(1−h𝑥 𝑗
)𝑥 𝑗
Questions