1 Introduction to Artificial Neural Networks Kristof Van Laerhoven kristof mis.tu-darmstadt.de http://www.mis.informatik.tu-darmstadt.de/kristof @ 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 2 Course Overview • We will: Learn to interpret common ANN formulas and diagrams Get an extensive overview of the major ANNs Understand mechanisms behind ANNs Introduce it with historic background • Left as exercises: Implementations of algorithms and details Evaluations and proofs 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 3 The Start of Artificial Neural Nets • Standard ANN introduction courses just mention: McCulloch, W. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 7:115 - 133. • But let s give a few more details… Warren McCulloch (1898 - 1972) Walter Pitts (1921 - 1969) ‘real neuron’… ‘simple model’… ‘some authors’… 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 4 before 1943: Neurobiology • Trepanation: Brain = ‘cranial stuffing’ • ~400BC: Hippocrates: Brain = “Seat of intelligence” • ~350BC: Aristotle: Brain cools blood • 160: Galen: Brain damage ~ mental functioning • 1543: Vesalius: anatomy • 1783: Galvani: electrical excitability of muscles, neurons 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 5 before 1943: Neurobiology • 1898: Camillo Golgi: Golgi Stains • 1906: Golgi + Santiago Ramón y Cajal: Nobel prize • ~1900: DuBois-Reymond, Müller, and von Helmholtz: neurons electrically excitable their activity affects electrical state of adjacent neurons 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already existed a lively community of biophysicists doing mathematical work on neural nets • Dissections, Golgi stains, and microscopes, but not: Church-Turing for the brain “One would assume, I think, that the presence of a theory, however strange, in a field in which no theory had previously existed, would have been a spur to the imagination of neurobiologists. But this did not occur at all! The whole field of neurology and neurobiology ignored the structure, the message, and the form of McCulloch’s and Pitts’s theory. Instead, those who were inspired by it were those who were destined to become the aficionados of a new venture, now called Artificial Intelligence, which proposed to realize in a programmatic way the ideas generated by the theory” -- Lettvin, 1989 ‘real’ neurons…
12
Embed
Introduction to Artificial Neural Networks€¦ · 4 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 19 Parameters • Bias can be seen as weight to constant
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Introduction to
Artificial Neural Networks
Kristof Van Laerhovenkristof mis.tu-darmstadt.de
http://www.mis.informatik.tu-darmstadt.de/kristof
@
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 2
Course Overview
• We will:
Learn to interpret common ANN formulas anddiagrams
Get an extensive overview of the major ANNs
Understand mechanisms behind ANNs
Introduce it with historic background
• Left as exercises:
Implementations of algorithms and details
Evaluations and proofs
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 3
The Start of Artificial Neural Nets
• Standard ANN introduction courses just mention:
McCulloch, W. and Pitts, W. (1943). A logical
calculus of the ideas immanent in nervous
activity. Bulletin of Mathematical Biophysics,7:115 - 133.
• But let s give a few more details…
Warren McCulloch(1898 - 1972)
Walter Pitts(1921 - 1969)
‘real neuron’… ‘simple model’… ‘some authors’…
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 4
before 1943: Neurobiology
• Trepanation: Brain = ‘cranial stuffing’
• ~400BC: Hippocrates:
Brain = “Seat of intelligence”
• ~350BC: Aristotle: Brain cools blood
• 160: Galen: Brain damage ~ mental
functioning
• 1543: Vesalius: anatomy
• 1783: Galvani:electrical excitability of muscles, neurons
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 5
before 1943: Neurobiology
• 1898: Camillo Golgi: Golgi Stains
• 1906: Golgi + Santiago Ramón y
Cajal: Nobel prize
• ~1900: DuBois-Reymond, Müller,and von Helmholtz:
neurons electrically excitable
their activity affects electrical state ofadjacent neurons
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6
before 1943: Neurobiology
• In 1943 there already existed a lively community of biophysicists doingmathematical work on neural nets
• Dissections, Golgi stains, and microscopes, but not: Church-Turing forthe brain
“One would assume, I think, that the
presence of a theory, however strange, in
a field in which no theory had previously
existed, would have been a spur to the
imagination of neurobiologists. But this did
not occur at all! The whole field of
neurology and neurobiology ignored the
structure, the message, and the form of
McCulloch’s and Pitts’s theory. Instead,
those who were inspired by it were those
who were destined to become the
aficionados of a new venture, now called
Artificial Intelligence, which proposed to
realize in a programmatic way the ideas
generated by the theory” -- Lettvin, 1989
‘real’ neurons…
2
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 7
• Walter PittsSelf-taught logic and maths, Principia @12y
Homeless, non-enrolled as student, eccentricManuscript on 3D networks
• Their seminal 1943 paper:Mathematical (logic) model for neuronNervous system can be considered a kind ofuniversal computing device as described by Leibniz
» Recommended Reading: Dark Hero Of The Information Age: In Search of NorbertWiener, the Father of Cybernetics, and search the web for Jerome Lettvin
What the frog's eyetells the frog's brain
1959
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 8
(Hebbian) Learning
• Previously:Pavlov, 1927: Classical Conditioning
Thorndike, 1898: Operant Conditioning
• Donald Hebb (psychology, teacher)The Organization of Behavior
What fires together, wires together: “When anaxon of cell A is near enough to excite cell B andrepeatedly or persistently takes part in firing it, somegrowth process or metabolic change takes place inone or both cells such that A's efficiency, as one ofthe cells firing B, is increased” -- D. Hebb
• Others: Steven Grossberg, GailCarpenter: Adaptive Resonance Theory
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 9
The Perceptron
° 1957: Frank Rosenblatt “The perceptron: A perceivingand recognizing automaton (project PARA).”
x1
x2
w1
w2y
x: inputs w: weights b: bias
0 < < 1: learning rate
Sign(y) : output (usually -1 or 1)
yT : target output (usually -1 or 1)
x3
w3
.
.
.
.
.
.
y = (wixi) + bi=1,2,3,...
b
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 10
Learning / Training Phase: (Widrow-Hoff, or Delta Rule)1. Start with random w1, w2, w3, …, b (usually near 0)2. Calculate y for the current input x1, x2, x3, …3. Update weights and bias: wi' = wi + (yT y)xi
4. Go to step 2
x3
w3
.
.
.
.
.
.
y = (wixi) + bi=1,2,3,...
b
yT?
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 11
The Perceptron: Example
• Teach the neuron to spotmale authors of emails on amailing list, given two inputs
x1: counts of {‘around’, ‘what’,‘more’} minus counts of{‘with’,’if’,’not’}
x2: counts of {‘at’,’it’,’many’}minus counts of {’myself’,’hers’,‘was’}
• Learning by examples:
Negative samples:
Positive samples:
• (loosely based on http://www.bookblog.net/gender/genie.php and “Gender, Genre, and
Writing Style in Formal Written Texts”, Shlomo Argamon, Moshe Koppel, Jonathan Fine,
Anat Rachel Shimoni)
………
(6,1)Coffee machine h…Kristof
(3,2)Re: UN petition fo…Alan
(-2,1)Printer on B floor i…Cath
(5,4)Visitors on Mon/TueHans
(-2,-1)Meeting room info…Christine
(5,-2)Camera Equipme…Gordon
(x1,x2)Subject:Sender:
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 12
The Perceptron: Example
• Teach the neuron to spotmale authors of emails on amailing list, given two inputs
x1: counts of {‘around’, ‘what’,‘more’} minus counts of{‘with’,’if’,’not’}
x2: counts of {‘at’,’it’,’many’}minus counts of {’myself’,’hers’,‘was’}
• Learning by examples:
Negative samples:
Positive samples:
• (loosely based on http://www.bookblog.net/gender/genie.php and “Gender, Genre, and
Writing Style in Formal Written Texts”, Shlomo Argamon, Moshe Koppel, Jonathan Fine,
Anat Rachel Shimoni)
x1
x2
alan
hans
kristof
gordonchristine
cath
Instead of using the whole mails,we use only keyword counts.
These type of ‘pre-processed’values are called features in
machine learning literature.
3
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 13
The Perceptron: Example
x1
x2
y =1
2x1
1
2x2 + 0 = 0
y = (wixi) + bi=1,2,3,...
w1=1/2
w2=-1/2
b = 0
y = w1x1+w2x2+b
= (1/2) x1 + (-1/2)x2 + 0
Initial Values:
sign(y)=-1
sign(y)=+1
x1
x2
w1
w2
b
y
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 14
The Perceptron: Example
x1
x2
y = (wixi) + bi=1,2,3,...
y =2
5x1
1
3x2
2
7= 0
w1=2/5
w2=-1/3
b = -2/7
y = w1x1+w2x2+b
= (2/5) x1 + (-1/3)x2 + (-2/7)
Values after training for all and (with =0.3):
yT=-1
yT=+1
x1
x2
w1
w2
b
y
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 15
The Perceptron: Example
x1
x2
y =9
12x1
2
23x2 2 = 0
y = (wixi) + bi=1,2,3,...
w1=9/12
w2=-2/23
b = -2
y = w1x1+w2x2+b
= (9/12) x1 + (-2/23)x2 + (-2)
Values after training 5 times for all and (with =0.3):
x1
x2
w1
w2
b
y
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 16
The Perceptron’s Delta Rule
E = yT y( )22
• Delta Rule or Gradient Descent:
• Error:
• Calculate partial derivative of the Error
w.r.t. each weight:
w1 = w'1 w1 = (yT y)x1
E
w1= (yT y)x1
Local Minima only!
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 17
The Perceptron
• Perceptron has 2 phases:
Training/Learning Phase: adaptive weights
Testing Phase: for checking performance andgeneralisation
• Neurons and their outputs are grouped in 1 layer:
x1
x2
w11
w21y1
b1
x3
w31
b2
w12w22
w32
x2b1
x3 b2
x1
bm
xn
: : :y2
y1
y2
ym
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 18
Output Functions
• Output function per
neuron: g(y) limits and
scales output
Simple threshold
Sigmoid
Gaussian
Piecewise Linear
…
g(x) = ex 2
2
g(x) =1
1+ e x
g(x) =
1x+1
20
...
...
...
1< x
1< x <1
x < 1
g(x) = tanh(x)
4
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 19
Parameters
• Bias can be seen as weight to constant
input 1, or input with constant weight 1
• Output parameters (e.g., steepness for
sigmoid)
• Momentum term to avoid local minima
• Initialisation: Use a ‘Committee of
networks’ to avoid random init effects
(many networks training on same data,
different initialised weights)
3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 20