Neural Network Neural Network Architectures Architectures Aydın Ulaş 02 December 2004 [email protected]
Jan 02, 2016
Neural Network ArchitecturesNeural Network Architectures
Aydın Ulaş
02 December 2004
Outline Of PresentationOutline Of Presentation
IntroductionNeural NetworksNeural Network ArchitecturesConclusions
IntroductionIntroduction
Some numbers…– The human brain contains about 10 billion nerve cells
(neurons)– Each neuron is connected to the others through 10000
synapses
Brain as a computational unit – It can learn, reorganize from experience– It adapts to the environment – It is robust and fault tolerant– Fast computations with too much individual
computational units
IntroductionIntroduction
Taking the nature as a model. Consider the neuron as a PE A neuron has
– Input (dendrites)– Output (the axon)
The information circulates from the dendrites to the axon via the cell body
Axon connects to dendrites via synapses– Strength of synapses change– Synapses may be excitatory or inhibitory
Perceptron (Artificial Neuron)Perceptron (Artificial Neuron)
Definition : Non linear, parameterized function with restricted output range
1
10
n
iiixwwfo
x1
ActivationFunction
w1
wd
w2
w0x2
xd
x0=1
o
Activation FunctionsActivation Functions
-10 -8 -6 -4 -2 0 2 4 6 8 10-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-10 -8 -6 -4 -2 0 2 4 6 8 10-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Linear
Sigmoid
Hyperbolic tangent
xy
)exp(1
1
xy
)exp()exp(
)exp()exp(
xx
xxy
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18
20
Neural NetworksNeural Networks
A mathematical model to solve engineering problems– Group of highly connected neurons to realize
compositions of non linear functions Tasks
– Classification– Clustering– Regression
According to input flow– Feed forward Neural Networks– Recurrent Neural Networks
Feed Forward Neural NetworksFeed Forward Neural Networks
The information is propagated from the inputs to the outputs
Time has no role (Acyclic, no feedbacks from outputs to inputs)
h1 hs
v10
h0=1
omoio1
hj
vmsvij
x1 xd
w10
x0=1 xk
wsdwjk
Recurrent NetworksRecurrent Networks
Arbitrary topologies Can model systems with
internal states (dynamic ones)
Delays can be modeled More difficult to train Problematic performance
– Stable Outputs may be more difficult to evaluate
– Unexpected behavior (oscillation, chaos, …)
x1 x2
LearningLearning The procedure that consists in estimating the parameters
of neurons (setting up the weights) so that the whole network can perform a specific task.
2 types of learning– Supervised learning– Unsupervised learning
The Learning process (supervised)– Present the network a number of inputs and their
corresponding outputs (Training)– See how closely the actual outputs match the desired ones– Modify the parameters to better approximate the desired
outputs– Several passes over the data
Supervised LearningSupervised Learning
The real outputs of the model for the given inputs is known in advance. The networks task is to approximate those outputs.
A “Supervisor” provides examples and teach the neural network how to fulfill a certain task
Unsupervised learningUnsupervised learning
Group typical input data according to some function.
Data clusteringNo need of a supervisor
– Network itself finds the correlations between the data
– Examples:• Kohonen feature maps (SOM)
Properties of Neural NetworksProperties of Neural Networks
Supervised networks are universal approximators (Non recurrent networks)
Can act as– Linear Approximator (Linear
Perceptron)– Nonlinear Approximator (Multi Layer
Perceptron)
Other PropertiesOther Properties
Adaptivity– Adapt weights to the environment easily
Ability to generalize– May provide against lack of data
Fault tolerance– Not too much degradation of performances if
damaged The information is distributed within the entire net.
Example ClassificationExample Classification
Handwritten digit recognition 16x16 bitmap representation
– Converted to 1x256 bit vector
7500 points on training set 3500 points on test set
0000000001100000000000011010000000000001000000000000001000000000000001000000000000001000000000000000100000000000000010000000000000001000000000000001000111110000000101100001100000011000000010000001100000001000000100000000100000001000000100000000011111110000
TrainingTraining
Try to minimize an error or cost function
Backpropogation algorithm– Gradient Descent
Learn the weights of the networkUpdate the weights according to the
error function
ApplicationsApplications
Handwritten Digit RecognitionFace recognitionTime series predictionProcess identificationProcess controlOptical character recognitionEtc…
Neural NetworksNeural Networks
Neural networks are statistical tools– Adjust non linear functions to accomplish a task– Need of multiple and representative examples but fewer
than in other methods Neural networks can model static (FF) and
dynamic (RNN) tasks NN’s are good classifiers BUT
– Good representations of data have to be formulated– Training vectors must be statistically representative of
the entire input space The use of NN needs a good comprehension of
the problem
Implementation of Neural NetworksImplementation of Neural Networks
Generic architectures (PC’s etc)Specific Neuro-HardwareDedicated circuits
Generic architecturesGeneric architectures
Conventional microprocessorsIntel Pentium, Power PC, etc … Advantages
– High performances (clock frequency, etc)– Cheap– Software environment available (NN tools, etc)
Drawbacks– Too generic, not optimized for very fast neural
computations
Classification of HardwareClassification of Hardware
NN Hardware– Neurochips
• Special Purpose• General Purpose (Ni1000, L - Neuro)
– NeuroComputers• Special Purpose (CNAPS, Synapse)• General Purpose
Specific Neuro-hardware circuitsSpecific Neuro-hardware circuits
Commercial chips CNAPS, Synapse, etc. Advantages
– Closer to the neural applications– High performances in terms of speed
Drawbacks– Not optimized to specific applications– Availability– Development tools
CNAPSCNAPS
SIMDOne instruction sequencing and
control unitProcessor nodes (PN)Single dimensional array (only right
or left nodes)
Dedicated circuitsDedicated circuits
A system where the functionality is buried in the hardware.
For specific applications only not changeable
AdvantagesOptimized for a specific applicationHigher performances than the other systems
DrawbacksHigh development costs in terms of time and
money
What type of hardware to be used in What type of hardware to be used in dedicated circuits ?dedicated circuits ?
Custom circuits– ASIC (Application-Specific Integrated Circuit)– Necessity to have good knowledge of the hardware
design– Fixed architecture, hardly changeable– Often expensive
Programmable logic– Valuable to implement real time systems– Flexibility– Low development costs– Lower performances compared to ASIC (Frequency,
etc.)
Programmable logicProgrammable logic
Field Programmable Gate Arrays (FPGAs)– Matrix of logic cells – Programmable interconnection– Additional features (internal memories +
embedded resources like multipliers, etc.)– Reconfigurability
• We can change the configurations as many times as desired
Real Time SystemsReal Time Systems
Execution of applications with time constraints.– Hard real-time systems
• Digital fly-by-wire control system of an aircraft:No lateness is accepted. The lives of people depend on the correct working of the control system of the aircraft.
– Soft real-time systems• Vending machine:
Accept lower performance for lateness, it is not catastrophic when deadlines are not met. It will take longer to handle one client with the vending machine.
Real Time SystemsReal Time Systems
ms scale real time system– Connectionist retina for image
processing• Artificial Retina: combining an image
sensor with a parallel architecture
µs scale real time system– Level 1 trigger in a HEP experiment
Connectionist RetinaConnectionist Retina Integration of a neural
network in an artificial retina
Screen– Matrix of Active Pixel
sensors CAN
– 8 bits ADC converter 256 levels of grey
Processing Architecture– Parallel system where
neural networks are implemented Processing
Architecture
CAN
Eye
Maharadja Processing ArchitectureMaharadja Processing Architecture
Micro-controller– Generic architecture executing
sequential cost with low power consumption
Memory – 256 Kbytes shared between
processor, PE’s, input– Store the network parameters
UNE (Unit Neural SIMD – Completely pipelined – 16 bit internal data bus)– Processors to compute the
neurons outputs– Command bus manages all
different operators in UNE Input/Output module
– Data acquisition and storage of intermediate results
Micro-controllerMicro-controller
Sequencer Sequencer
Command busCommand bus
Input/OutputInput/Outputunitunit
Instruction BusInstruction Bus
UNE-0 UNE-1 UNE-2 UNE-3
M M M M
Level 1 trigger in a HEP experimentLevel 1 trigger in a HEP experiment
High Energy Physics (Particle Physics) Neural networks have provided interesting
results as triggers in HEP.– Level 2 : H1 experiment 10 – 20 µs – Level 1 : Dirac experiment 2 µs
Particle Recognition High timing constraints (in terms of
latency and data throughput)
Neural Network architectureNeural Network architecture
……..
……..
64
128
Execution time : ~500 ns
Weights coded in 16 bitsStates coded in 8 bits
with data arriving every BC=25ns
4Electrons, tau, hadrons, jets
Very Fast ArchitectureVery Fast Architecture
TanHPE PE PEPE
PE PE PEPE
PE PE PEPE
PE PE PEPE
TanH
TanH
TanH
ACC
ACC
ACC
ACC
256 PE’s Matrix of n*m
matrix elements Control unit I/O module TanH are stored in
LUTs 1 matrix row
computes a neuron The results is
back-propagated to calculate the output layer
I/O module Control unit
PE architecturePE architecture
X
AccumulatorMultiplier
Weights mem
Input data 8
16
Addr gen
+
Data in
cmd bus
Control Module
Data out
Neuro-hardware todayNeuro-hardware today
Generic Real time applications– Microprocessors technology (PCs, computers, i.e.
software) is sufficient to implement most of neural applications in real-time (ms or sometimes µs scale)
• This solution is cheap• Very easy to manage
Constrained Real time applications– It still remains specific applications where powerful
computations are needed e.g. particle physics– It still remains applications where other constraints
have to be taken into consideration (Consumption, proximity of sensors, mixed integration, etc.)
ClusteringClustering
Idea : Combine performances of different processors to perform massive parallel computations
High speedconnection
ClusteringClustering
Advantages– Take advantage of the implicit
parallelism of neural networks– Utilization of systems already available
(university, Labs, offices, etc.)– High performances : Faster training of a
neural net – Very cheap compare to dedicated
hardware
ClusteringClustering
Drawbacks– Communications load : Need of very
fast links between computers – Software environment for parallel
processing– Not possible for embedded applications
Hardware ImplementationsHardware Implementations
Most real-time applications do not need dedicated hardware implementation– Conventional architectures are generally appropriate– Clustering of generic architectures to combine
performances Some specific applications require other
solutions– Strong Timing constraints
• Technology permits to utilize FPGAs – Flexibility– Massive parallelism possible
– Other constraints (consumption, etc.)• Custom or programmable circuits