24 CHAPTER 2 LITERATURE SURVEY 2.1 Neural Networks Basics: An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well. 2.1.1 Use Neural Networks: Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions. 2.1.2 Advantages of ANN: i. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience. ii. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time. iii. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.
49
Embed
CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
24
CHAPTER 2
LITERATURE SURVEY
2.1 Neural Networks Basics:
An Artificial Neural Network (ANN) is an information processing
paradigm that is inspired by the way biological nervous systems, such as
the brain, process information. The key element of this paradigm is the
novel structure of the information processing system. It is composed of a
large number of highly interconnected processing elements (neurons)
working in unison to solve specific problems. ANNs, like people, learn
by example. An ANN is configured for a specific application, such as
pattern recognition or data classification, through a learning process.
Learning in biological systems involves adjustments to the synaptic
connections that exist between the neurons. This is true of ANNs as well.
2.1.1 Use Neural Networks:
Neural networks, with their remarkable ability to derive meaning
from complicated or imprecise data, can be used to extract patterns and
detect trends that are too complex to be noticed by either humans or other
computer techniques. A trained neural network can be thought of as an
"expert" in the category of information it has been given to analyze. This
expert can then be used to provide projections given new situations of
interest and answer "what if" questions.
2.1.2 Advantages of ANN:
i. Adaptive learning: An ability to learn how to do tasks based on
the data given for training or initial experience.
ii. Self-Organization: An ANN can create its own organization or
representation of the information it receives during learning time.
iii. Real Time Operation: ANN computations may be carried out in
parallel, and special hardware devices are being designed and
manufactured which take advantage of this capability.
25
iv. Fault Tolerance via Redundant Information Coding: Partial
destruction of a network leads to the corresponding degradation of
performance. However, some network capabilities may be retained
even with major network damage.
2.1.3 Neural Networks versus Conventional Computers:
Neural networks take a different approach to problem solving than
that of conventional computers. Conventional computers use an
algorithmic approach i.e. the computer follows a set of instructions in
order to solve a problem. Unless the specific steps that the computer
needs to follow are known the computer cannot solve the problem. That
restricts the problem solving capability of conventional computers to
problems that we already understand and know how to solve. But
computers would be so much more useful if they could do things that we
don't exactly know how to do.
Neural networks process information in a similar way the human
brain does. The network is composed of a large number of highly
interconnected processing elements (neurons) working in parallel to solve
a specific problem. Neural networks learn by example. They cannot be
programmed to perform a specific task. The examples must be selected
carefully otherwise useful time is wasted or even worse the network
might be functioning incorrectly. The disadvantage is that because the
network finds out how to solve the problem by itself, its operation can be
unpredictable.
On the other hand, conventional computers use a cognitive
approach to problem solving; the way the problem is to solved must be
known and stated in small unambiguous instructions. These instructions
are then converted to a high level language program and then into
machine code that the computer can understand. These machines are
26
totally predictable; if anything goes wrong is due to a software or
hardware fault.
Neural networks and conventional algorithmic computers are not in
competition but complement each other. There are tasks are more suited
to an algorithmic approach like arithmetic operations and tasks that are
more suited to neural networks. Even more, a large number of tasks,
require systems that use a combination of the two approaches (normally a
conventional computer is used to supervise the neural network) in order
to perform at maximum efficiency.
2.1.4 The Neuron:
The neuron is the basic building block of the neural network. A
neuron is a communication conduit that both accepts input and produces
output. The neuron receives its input either from other neurons or the user
program. Similarly, the neuron sends its output to other neurons or the
user program.
Figure 2.1.Mathematical representation of a Neuron
The commonest type of artificial neural network consists of three groups,
or layers, of units: a layer of "input" units is connected to a layer of
27
"hidden" units, which is connected to a layer of "output" units. (See
Figure 1)
The activity of the input units represents the raw information
that is fed into the network.
The activity of each hidden unit is determined by the activities
of the input units and the weights on the connections between
the input and the hidden units.
The behavior of the output units depends on the activity of the
hidden units and the weights between the hidden and output
units.
This simple type of network is interesting because the hidden units
are free to construct their own representations of the input. The weights
between the input and hidden units determine when each hidden unit is
active, and so by modifying these weights, a hidden unit can choose what
it represents.
We also distinguish single-layer and multi-layer architectures. The
single-layer organization, in which all units are connected to one another,
constitutes the most general case and is of more potential computational
power than hierarchically structured multi-layer organizations. In multi-
layer networks, units are often numbered by layer, instead of following a
global numbering.
Figure 2.2. Multi layer Architecture
28
2.1.5 Neuron Connection Weights:
The previous section already mentioned that neurons are usually
connected together. These connections are not equal, and can be assigned
individual weights. These weights are what give the neural network the
ability to recognize certain patterns. Adjust the weights, and the neural
network will recognize a different pattern.
Adjustment of these weights is a very important operation. Later
chapters will show you how neural networks can be trained. The process
of training is adjusting the individual weights between each of the
individual neurons until we achieve close to the desired output.
2.1.6 The Learning Process:
The memorization of patterns and the subsequent response of the network
can be categorized into two general paradigms [8]:
Associative mapping in which the network learns to produce a
particular pattern on the set of input units whenever another
particular pattern is applied on the set of input units. The
associative mapping can generally be broken down into two
mechanisms:
Auto-association: an input pattern is associated with itself and the
states of input and output units coincide. This is used to provide
pattern completion, i.e. to produce a pattern whenever a portion of
it or a distorted pattern is presented. In the second case, the
network actually stores pairs of patterns building an association
between two sets of patterns.
hetero-association: is related to two recall mechanisms:
o nearest-neighbor recall, where the output pattern produced
corresponds to the input pattern stored, which is closest to
the pattern presented, and
29
o Interpolative recall, where the output pattern is a similarity
dependent interpolation of the patterns stored corresponding
to the pattern presented. Yet another paradigm, which is a
variant associative mapping, is classification, i.e. when there
is a fixed set of categories into which the input patterns are
to be classified.
Regularity detection in which units learn to respond to particular
properties of the input patterns. Whereas in associative mapping
the network stores the relationships among patterns, in regularity
detection the response of each unit has a particular 'meaning'. This
type of learning mechanism is essential for feature discovery and
knowledge representation.
Every neural network possesses knowledge which is contained in
the values of the connections weights. Modifying the knowledge stored in
the network as a function of experience implies a learning rule for
changing the values of the weights.
All learning methods used for adaptive neural networks can be classified
into two major categories:
Supervised learning which incorporates an external teacher, so
that each output unit is told what its desired response to input
signals ought to be. During the learning process global information
may be required. Paradigms of supervised learning include error-
correction learning, reinforcement learning and stochastic learning.
An important issue concerning supervised learning is the problem
of error convergence, i.e. the minimization of error between the
desired and computed unit values. The aim is to determine a set of
weights which minimizes the error. One well-known method,
which is common to many learning paradigms, is the least mean
square (LMS) convergence.
30
Unsupervised learning uses no external teacher and is based upon
only local information. It is also referred to as self-organization, in
the sense that it self-organizes data presented to the network and
detects their emergent collective properties. Paradigms of
unsupervised learning are Hebbian learning and competitive
learning.A neural network learns on-line if it learns and operates at
the same time. Usually, supervised learning is performed off-line,
whereas unsupervised learning is performed on-line.
Transfer Function [8]:
The behavior of an ANN (Artificial Neural Network) depends on
both the weights and the input-output function (transfer function) that is
specified for the units. This function typically falls into one of three
categories: linear (or ramp), threshold and sigmoid.
For linear units, the output activity is proportional to the total weighted
output. For threshold a unit, the output is set at one of two levels,
depending on whether the total input is greater than or less than some
threshold value. For sigmoid units, the output varies continuously but not
linearly as the input changes. Sigmoid units bear a greater resemblance to
real neurons than do linear or threshold units, but all three must be
considered rough approximations.
2.1.7 Error Calculation [9]:
Error calculation is an important aspect of any neural network.
Whether the neural network is supervised or unsupervised, an error rate
must be calculated. The goal of virtually all training algorithms is to
minimize the error. In this section we will examine how the error is
calculated for a supervised neural network. We will also discuss how the
error is determined for an unsupervised training algorithm. We will begin
this section by discussing two error calculation steps used for supervised
training.
31
Error Calculation and Supervised Training [9]:
Error calculation is an important part of the supervised training
algorithm. In this section we will examine an error calculation method
that can be employed by supervised training. For supervised training
there are two components to the error that must be considered. First, we
must calculate the error for each of the training sets as they are processed.
Secondly we must take the average across each sample for the training
set. For example, the XOR problem that has only four items in its training
set. An output error would be calculated on each element of the training
set. Finally, after all training sets have been processed, the root mean
square (RMS) error is determined.
Output Error:
The output error is simply an error calculation that is done to
determine how far off a neural network's output was from the ideal
network. This value is rarely used for any purpose other than a stepping
stone on the way to the calculation of root mean square (RMS) error.
Once all training sets have been used the RMS error can be calculated.
This error acts as the global error for the entire neural network.
2.1.8 A Feed Forward Neural Network:
A "feed forward" neural network [9] is similar to the types of neural
networks that we are ready examined. Just like many other neural
network types the feed forward neural network begins with an input layer.
This input layer must be connected to a hidden layer. This hidden
layer can then be connected to another hidden layer or directly to the
output layer. There can be any number of hidden layers so long as at least
one hidden layer is provided. In common use most neural networks will
have only one hidden layer. It is very rare for a neural network to have
more than two hidden layers. We will now examine, in detail, and the
structure of a "feed forward neural network".
32
The Structure of a Feed Forward Neural Network:
A "feed forward" neural network differs from the neural networks
previously examined. Figure 2.3 shows a typical feed forward neural
network with a single hidden layer
Figure 2.3. A typical feed forward neural network with a single hidden
layer
Choosing the Network Structure:
As we saw the previous section there are many ways that feed
forward neural networks can be constructed. You must decide how many
neurons will be inside the input and output layers. You must also decide
how many hidden layers you're going to have, as well as how many
neurons will be in each of these hidden layers.
There are many techniques for choosing these parameters. In this
section we will cover some of the general "rules of thumb" that you can
use to assist you in these decisions. Rules of thumb will only take you so
33
far. In nearly all cases some experimentation will be required to
determine the optimal structure for your "feed forward neural network".
The Input Layer:
The input layer to the neural network is the conduit through which
the external environment presents a pattern to the neural network. Once a
pattern is presented to the input later of the neural network the output
layer will produce another pattern. In essence this is all the neural
network does. The input layer should represent the condition for which
we are training the neural network for. Every input neuron should
represent some independent variable that has an influence over the output
of the neural network.
It is important to remember that the inputs to the neural network
are floating point numbers. These values are expressed as the primitive
Java data type "double". This is not to say that you can only process
numeric data with the neural network. If you wish to process a form of
data that is non-numeric you must develop a process that normalizes this
data to a numeric representation.
The Output Layer:
The output layer of the neural network is what actually presents a
pattern to the external environment. Whatever patter is presented by the
output layer can be directly traced back to the input layer. The number of
a output neurons should directly related to the type of work that the
neural network is to perform.
To consider the number of neurons to use in your output layer you
must consider the intended use of the neural network. If the neural
network is to be used to classify items into groups, then it is often
preferable to have one output neurons for each group that the item is to be
assigned into. If the neural network is to perform noise reduction on a
signal then it is likely that the number of input neurons will match the
34
number of output neurons. In this sort of neural network you would one
day he would want the patterns to leave the neural network in the same
format as they entered.
The Number of Hidden Layers:
There are really two decisions that must be made with regards to
the hidden layers. The first is how many hidden layers to actually have in
the neural network. Secondly, you must determine how many neurons
will be in each of these layers. We will first examine how to determine
the number of hidden layers to use with the neural network.
Neural networks with two hidden layers can represent functions
with any kind of shape. There is currently no theoretical reason to use
neural networks with any more than two hidden layers. Further for many
practical problems there's no reason to use any more than one hidden
layer. Problems that require two hidden layers are rarely encountered.
Differences between the numbers of hidden layers are summarized in
Table 2.1.
Number of
Hidden
Layers
Result
None Only capable of representing linear separable
functions or decisions.
1 Can approximate arbitrarily while any
functions which contains a continuous
mapping from one finite space to another.
2 Represent an arbitrary decision boundary to
arbitrary accuracy with rational activation
functions and can approximate any smooth
mapping to any accuracy.
Table 2.1: Determining the number of hidden layers
35
Just deciding the number of hidden neuron layers is only a small
part of the problem. You must also determine how many neurons will be
in each of these hidden layers. This process is covered in the next section.
The Number of Neurons in the Hidden Layers:
Deciding the number of hidden neurons in layers is a very
important part of deciding your overall neural network architecture.
Though these layers do not directly interact with the external environment
these layers have a tremendous influence on the final output. Both the
number of hidden layers and number of neurons in each of these hidden
layers must be considered.
Using too few neurons in the hidden layers will result in something
called under fitting. Under fitting occurs when there are too few neurons
in the hidden layers to adequately detect the signals in a complicated data
set. Using too many neurons in the hidden layers can result in several
problems. First too many neurons in the hidden layers may result in over
fitting. Over fitting occurs when the neural network has so much
information processing capacity that the limited amount of information
contained in the training set is not enough to train all of the neurons in the
hidden layers [9].
A second problem can occur even when there is sufficient training
data. An inordinately large number of neurons in the hidden layers can
increase the time it takes to train the network. The amount of training
time can increase enough so that it is impossible to adequately train the
neural network. Obviously some compromise must be reached between
too many and too few look neurons in the hidden layers.There are many
rule-of-thumb methods for determining the correct number of neurons to
use in the hidden layers.
36
Some of them are summarized as follows.
• The number of hidden neurons should be in the range between the size
of the input layer and the size of the output layer.
• The number of hidden neurons should be 2/3 of the input layer size,
plus the size of the output layer.
• The number of hidden neurons should be less than twice the input
layer size.
These three rules are only starting points that you may want to
consider. Ultimately the selection of the architecture of your neural
network will come down to trial and error. But what exactly is meant by
trial and error. You do not want to start throwing random layers and
numbers of neurons at your network. To do so would be very time-
consuming. There are two methods they can be used to organize your trial
and error search for the optimum network architecture.
There are two trial and error approaches that you may use in
determining the number of hidden neurons are the "forward" and
"backward" selection methods. The first method, the "forward selection
method", begins by selecting a small number of hidden neurons. This
method usually begins with only two hidden neurons. Then the neural
network is trained and tested. The number of hidden neurons is then
increased and the process is repeated so long as the overall results of the
training and testing improved. The "forward selection method" is
summarized in the figure 2.4.
2.1.9 Applications of NN:
Prediction: learning from past experience
o pick the best stocks in the market
o predict weather
o identify people with cancer risk
37
Classification
o Image processing
o Predict bankruptcy for credit card companies
o Risk assessment
Figure 2.4. Selecting the number of hidden neurons with forward
selection
Recognition
o Pattern recognition: SNOOPE (bomb detector in U.S.
airports)
o Character recognition
o Handwriting: processing checks
38
Data association
Not only identify the characters that were scanned but identify
when the scanner is not working properly
Data Filtering
o e.g. take the noise out of a telephone signal, signal
smoothing
Planning
o Unknown environments
o Sensor data is noisy
o Fairly new approach to planning
Advantages:
Adapt to unknown situations
Robustness: fault tolerance due to network redundancy
Autonomous learning and generalization
Disadvantages
Not exact
Large complexity of the network structure
2.1.10 Problems not suited to a Neural Network:
Programs that are easily written out as a flowchart are an example
of programs that are not well suited to neural networks. If your program
consists of well defined steps, normal programming techniques will
suffice [9].Another criterion to consider is whether the logic of your
program is likely to change. The ability for a neural network to learn is
one of the primary features of the neural network. If the algorithm used to
solve your problem is an unchanging business rule there is no reason to
use a neural network. It might be detrimental to your program if the
neural network attempts to find a better solution, and begins to diverge
from the expected output of the program.
39
Finally, neural networks are often not suitable for problems where
you must know exactly how the solution was derived. A neural network
can become very useful for solving the problem for which the neural
network was trained. But the neural network cannot explain its reasoning.
The neural network knows because it was trained to know. The neural
network cannot explain how it followed a series of steps to derive the
answer.
2.1.11 Problems Suited to a Neural Network:
Although there are many problems that neural networks are not
suited for there are also many problems that a neural network is quite
useful for solving. In addition, neural networks can often solve problems
with fewer lines of code than a traditional programming algorithm. It is
important to understand what these problems are. Neural networks are
particularly useful for solving problems that cannot be expressed as a
series of steps, such as recognizing patterns, classifying into groups,
series prediction and data mining.
2.1.12 Validating Neural Networks:
Once a neural network has been trained it must be evaluated to see
if it is ready for actual use. This final step is important so that it can be
determined if additional training is required. To correctly validate a
neural network, validation data must be set aside that is completely
separate from the training data.
As an example, consider a classification network that must group
elements into three different classification groups. You are provided with
10,000 sample elements. For this sample data the group that each element
should be classified into is known. For such a system you would divide
the sample data into two groups of 5,000 elements. The first group would
form the training set. Once the network was properly trained the second
group of 5,000 elements would be used to validate the neural network.
40
It is very important that a separate group always be maintained for
validation. First training a neural network with a given sample set and
also using this same set to predict the anticipated error of the neural
network a new arbitrary set, will surely lead to bad results. The error
achieved using the training set will almost always be substantially lower
than the error on a new set of sample data. The integrity of the validation
data must always be maintained.
This brings up an important question. What exactly does happen if
the neural network that you have just finished training performs poorly on
the validation set? If this is the case, then you must examine what,
exactly, this means. It could mean that the initial random weights were
not good. Rerunning the training with new initial weights could correct
this. While an improper set of initial random weights could be the cause,
a more likely possibility is that the training data was not properly chosen.
If the validation is performing badly this most likely means that
there was data present in the validation set that was not available in the
training data. The way that this situation should be solved is by trying a
different, more random, way of separating the data into training and
validation sets. If this fails, you must combine the training and validation
sets into one large training set. Then new data must be acquired to serve
as the validation data [9].
For some situations it may be impossible to gather additional data
to use as either training or validation data. If this is the case then you are
left with no other choice but to combine all or part of the validation set
with the training set. While this approach will forgo the security of a good
validation, if additional data cannot be acquired this may be your only
alternative.
41
2.2 Introduction to Associative memories:
The associative memory models[10], an early class of neural
models that fit perfectly well with the vision of cognition emergent
from today brain neuro-imaging techniques, are inspired on the
capacity of human cognition to build calculus makes them a possible
link between connectionist models and classical artificial intelligence
developments.
Our memories function as an associative or content - addressable.
That is, a memory does not exist in some isolated fashion, located in a
particular set of neurons. Thus memories are stored in association with
one another. These different sensory units lie in completely separate
parts of the brain, so it is clear that the memory of the person must
be distributed throughout the brain in some fashion. We access the
memory by its contents not by where it is stored in the neural
pathways of the brain. This is very powerful; given even a poor
photograph of that person we are quite good at reconstructing the persons
face quite accurately. This is very different from a traditional
computer where specific facts are located in specific places in
computer memory. If only partial information is available about this
location, the fact or memory cannot be recalled at all.
Traditional measures of associative memory performance are
its memory capacity and content-addressability. Memory capacity
refers to the maximum number of associated pattern pairs that can be
stored and correctly retrieved while content-addressability is the ability
of the network to retrieve the correct stored pattern. Obviously, the
two performance measures are related to each other. It is known that
using Hebb's learning rule in building the connection weight matrix
of an associative memory yields a significantly low memory
capacity. Due to the limitation brought about by using Hebb's
42
learning rule, several Modifications and variations are proposed to
maximize the memory capacity [11].
2.2.1 Learning:
Learning is the way we acquire knowledge about the world around
us, and it is through this process of knowledge acquisition, that the
environment alerts our behavioral responses. Learning allows us to
store and retain knowledge; it builds our memories.
Aristotle stated about memory: first, the elementary unit of
memory is a sense image and second, association and links
between elementary memories serve as the basis for higher level
cognition. Memory stands for the elementary unit and association for
recollection between elementary units [11]. In a neurobiological
context, memory refers to the relatively enduring neural alterations
induced by the interaction of an organism with its environment.
Without such a change, there is no memory. The memory must be useful
and accessible to the nerves system that influences the future behavior.
Memory and Learning are intricately connected. When a particular
activity pattern is learned, it is stored in the brain where it can be
recalled later when required. Learning encodes information. A system
learns a pattern if the system encodes the pattern in its structure. The
system structure changes as the system learns the information. So,
learning involves change.
That change can be represented in memory for future behavior.
Over the past century the psychologists have studied learning based
on fundamental paradigms: non-associative and associative. In non-
associative learning an organism acquires the properties of a single
repetitive stimulus. In associative learning [Edward Thorndike, B.F.
Skinner], an organism acquires knowledge about the relationship of
43
either one stimulus to another, or one stimulus to the organisms
own Behavioral response to that stimulus.
On the neuronal basis of formation of memories into two
distinct categories: STM (short term memory) and LTM (long term
memory). Inputs to the brain are processed into STM‘s which last
at the most for a few minutes. Information is downloaded into LTM‘s
for more permanent storage. One of the most important functions of
our brain is the laying down and recall of memories. It is difficult to
imagine how we could function without both short and long term
memory. The absence of short term memory would render most
tasks extremely difficult if not impossible - life would be punctuated by
a series of one time images with no logical connection between
them. Equally, the absence of any means of long term memory would
ensure that we could not learn by past experience.
The acquisition of knowledge is an active, ongoing cognitive
process based on our perceptions. An important point about the learning
mechanism is that it distributes the memory over different areas, making
them robust to damage. Distributed storage permits the brain to work
easily from partially corrupted information [11]. .
2.2.2 Associative Memory Model:
Associative memory maps [10, 12] data from an input space to data
in an output space. In general, this mapping is from unknown
domain points to known range points, where the memory learns an
underlying association from a training data set. For non-learning
memory models, which have their origin in additive neuronal
dynamics, connection strength‘s are ―programmed‖ a priori
depending upon the association that are to be encoded in the system.
Sometimes these memories are referred to as matrix associative
memories, because a connection matrix W, encodes associations
44
where is one of the programmed memories
then is called the association of . When are in
different spaces then the model is hetero- associative memory. i.e. it
associates two different vectors with one another. If , then the
model is Auto-associative memory. i.e., it associates a vector with itself.
Associative memory models enjoy properties such as fault tolerance.
Types of Associative Neural Memories:
Associative neural memories are concerned with associative
learning and retrieval of information (vector patterns) in neural
networks. These networks represent one of the most extensively
analyzed classes of artificial neural networks. Several associative
neural memory models have been proposed over the last two decades.
These memory models can be classified into various ways depending on
Architecture (Static versus Dynamic)
Retrieval Mode (Synchronous versus Asynchronous)
Nature of stored association (Auto-associative versus Hetero-
associative)
Complexity and capability of memory storage
Simple Associative memories are static and very low memory so that
they cannot be applied in the applications where high memory is required
[11]. Several modes can also be used to update the states of the
units in both layers namely synchronous, asynchronous, and a
combination of the two. In synchronous updating scheme, the states
of the units in a layer are updated as a group prior to propagating the
output to the other layer. In asynchronous updating, units in both
layers are updated in some order and output is propagated to the
other layer after each unit update. Lastly, in synchronous-
asynchronous updating, there can be subgroups of units in each
45
layer that are updated synchronously while units in each subgroup are
updated asynchronously.
Dynamic Associative memories such as Hopfield, Bi Directional
Associative memory (BAM), Brain in State Box(BSB) are Dynamical
memories but they also capable of supporting very low memory, so they
cannot be applied in the applications where high memory is required,
because of this reason we chosen Context Sensitive auto-associative
memory model for developing the expert system and also this can be
compared with some of machine learning algorithms such as Back
propagation, Bayesian Networks, C4.5 and Particle Swarm Optimization.
Dynamic Associative memories such as Hopfield, BSB, and
BAM are Dynamical memories but they are also capable of supporting
very low memory, so they cannot be applied in the applications where
high memory requirements are there.
A simple model describing context-dependent associative
memories generates a good vectorial representation of basic logical
calculus. One of the powers of this vectorial representation is the
very natural way in which binary matrix operators are capable to
compute ambiguous situations. This fact presents a biological
interest because of the very natural way in which the human mind is
able to take decisions in the presence of uncertainties. Also these
memories could be used to develop expert agents to the recent problem
domain. Holographic memories are being used to build the many
advanced memory based agents like memory cards, USB Drives,
etc., [11]. The advantage of using recurrent networks as associative
memory is their convergence to one of a finite number of stable states
when started at some initial state. The basic goals are:
• To be able to store as many exemplars as we need, each
corresponding to a different stable state of the network,
46
• To have no other stable state
• To have the stable state that the network converges to be the one
closest
• to the applied pattern
The problems that we are faced with Associative memories:
• The capacity of the network is restricted,
• Depending on the number and properties of the patterns to be
stored,
• some of the exemplar may not be the stable states,
• Some spurious stable states different than the exemplars may arise
by themselves
• The converged stable state may be other than the one closest to the
applied pattern
2.3. Related work:
Not surprisingly, researchers have also tried to use neural networks
in Cryptography. A recent survey of the literature indicates that there has
been an increasing interest in the application of different classes of neural
networks to problems related to cryptography in the past few years.
Recent works have examined the use of neural networks in
cryptosystems. Typical examples include key management, generation
and exchange protocols; visual cryptography; pseudo random generators;
digital watermarking; and steganalysis [13].
2.3.1 Interacting neural network and cryptography:
The goal of any cryptographic system is the exchange of
information among the intended users without any leakage of information
to others who may have unauthorized access to it. A common secret key
could be created over a public channel accessible to any opponent. Neural
networks can be used to generate common secret key.
47
In case of neural cryptography, both the communicating
networks receive an identical input vector, generate an output bit and
are trained based on the output bit. The two networks and their weight
vectors exhibit a novel phenomenon, where the networks synchronize to a
state with identical time-dependent weights. The generated secret key
over a public channel is used for encrypting and decrypting the
information being sent on the channel [14]
Based on chaotic neural networks, a Hash function can be
constructed, which makes use of neural networks' diffusion property
and chaos' confusion property. This function encodes the plaintext of
arbitrary length into the hash value of fixed length (typically, 128-bit,
256-bit or 512-bit). Theoretical analysis and experimental results show
that this hash function is one-way, with high key sensitivity and
plaintext sensitivity, and secure against birthday attacks or meet-in-the-
middle attacks. These properties make it a suitable choice for data
signature or authentication [15].
Neural cryptography deals with the problem of key exchange using
the mutual learning concept between two neural networks. The two
networks will exchange their outputs (in bits) so that the key between the
two communicating parties is eventually represented in the final learned
weights and the two networks are said to be synchronized. Security of
neural synchronization depends on the probability that an attacker can
synchronize with any of the two parties during the training process, so
decreasing this probability improves the reliability of exchanging their
output bits through a public channel [16].Artificial neural networks are
used to classify functional blocks from a disassembled program as being
either cryptography related or not. The resulting system, referred to as
NNLC (Neural Net for Locating Cryptography) [17].
48
When training a neural network it is tempting to experiment with
architectures until a low total error is achieved. The danger in doing so is
the creation of a network that loses generality by over-learning the
training data; lower total error does not necessarily translate into a low
total error in validation. The resulting network may keenly detect the
samples used to train it, without being able to detect subtle variations in
new data. A method is presented for choosing the best neural network
architecture for a given data set based on observation of its accuracy,
precision, and mean square error [18].
The method, based on, relies on k-fold cross validation to evaluate
each network architecture k times to improve the reliability of the choice
of the optimal architecture. The need for four separate divisions of the
data set is demonstrated (testing, training, and validation, as normal, and
a comparison set). Instead of measuring simply the total error the
resulting discrete measures of accuracy, precision, false positive, and
false negative are used. This method is then applied to the problem of
locating cryptographic algorithms in compiled object code for two
different CPU architectures to demonstrate the suitability of the method.
2.4 Passwords:
Basics of Passwords: Passwords are at present the most common method
for verifying the identity of a user. This is a flawed method; systems
continue to use passwords because of their ease of use and ease of
implementation. Among many problems are the successful guessing of
user‘s passwords, and the intercepting of them or uncovering them online.
To prevent guessing and for additional security, the National Security
Agency (NSA) recommends using a random 8-letter password that is
regularly changed [19].
Since such a stream of passwords is almost impossible to
remember (certainly for me), the hapless user is forced to write these
49
passwords down, adding to the insecurity. Thus passwords need to be
protected by cryptographic techniques, whether they are stored or
transmitted. Several simple techniques can help make the old-fashioned
form of passwords easier to memorize. First, the system can present a
user with a list of possible random passwords from which to choose. With
such a choice, there may be one password that is easier for a given user to
remember. Second, the most common passwords are limited to 8
characters, and experience has shown that users have a hard time picking
such a short password that turns out to be secure.
If the system allows passwords of arbitrary length (fairly common
now), then users can employ pass phrases: a phrase or sentence that is not
going to be in dictionaries yet is easy for the given user to remember. My
favorite pass phrase is ―Dexter‘s mother‘s bread,‖ but I won‘t be able to
use it any more. Personal physical characteristics form the basis for a
number of identification methods now in use. The characteristics or
biometrics range from fingerprints to iris patterns, from voice to hand
geometry, among many examples. These techniques are outside the scope
of this book. The remaining two sections study two uses of one-way
functions to help secure passwords. A simple system password scheme
would just have a secret file holding each user‘s account name and the
corresponding password. There are several problems with this method: if
someone manages to read this file, they can immediately pretend to be
any of the users listed. Also, someone might find out about a user‘s likely
passwords from passwords used in the past.
For the reasons above and others, early UNIX systems protected
passwords with a one-way function (described in an earlier chapter).
Along with the account name, the one-way function applied to the
password is stored. Thus given a user A, with account name NA and
password PA, and given a fixed one-way function h, the system would
50
store NA and h (PA) as a table entry in the password file, with similar
entries for other users. When A supplies her password to the system, the
software computes h of her password and compares this result with the
table entry. In this way the systems administrators themselves will not
know the passwords of users and will not be able to impersonate a user.
In early UNIX systems it was a matter of pride to make the
password file world readable. A user would try to guess other‘s
passwords by trying a guess P: first calculate h (P) and then compare this
with all table entries. There were many values of P to try, such as entries
in a dictionary, common names, special entries that are often used as
passwords, all short passwords, and all the above possibilities with
special characters at the beginning or the end. These ―cracker‖ programs
have matured to the point where they can always find at least some
passwords if there are quite a few users in the system. Now the password
file is no longer public, but someone with root privileges can still get to it,
and it sometimes leaks out in other ways.
To make the attack in the previous paragraph harder (that attack is
essentially the same as cipher text searching), systems can first choose h
the one-way function to be more execution time intensive. This only
slows down the searches by a linear factor. Another approach uses an
additional random table entry, called a salt. Suppose for example that
each password table entry has another random t-bit field (the salt),
different for each password. When Alice first puts her password into the
system (or changes it), she supplies PA. The system chooses the salt and
calculates EA = h (PA, SA), where h is fixed up to handle two inputs
instead of one.
The password file entry for Alice now contains A, SA, and EA.
With this change, an attack on a single user is the same, but the attack of
the previous paragraph on all users at the same time now takes either an
51
extra factor of time equal to either 2t or the number of users, whichever is
smaller. Without the salt, an attacker could check if ―Dexter‖ were the
password of any user by calculating h (“Dexter”) and doing a fast search
of the password file for this entry. With the salt, to check if Alice is using
―Dexter‖ for example, the attacker must retrieve Alice‘s salt SA and
calculate h (“Dexter”, SA). Each user requires a different calculation, so
this simple device greatly slows down the dictionary attack.
Text Password:
Password strength is a measure of the effectiveness of
a password in resisting guessing and brute-force attacks. In its usual form,
it estimates how many trials an attacker who does not have direct access
to the password would need, on average, to guess it correctly. The
strength of a password is a function of length, complexity, and
unpredictability [20]
However, other attacks on passwords can succeed without a brute
search of every possible password. For instance, knowledge about a user
may suggest possible passwords (such as pet names, children's names,
etc.). Hence estimates of password strength must also take into account
resistance to other attacks as well. Using strong passwords lowers
overall risk of a security breach, but strong passwords do not replace the
need for other effective security controls. The effectiveness of a password
of a given strength is strongly determined by the design and
implementation of the authentication system software, particularly how
frequently password guesses can be tested by an attacker and how
securely information on user passwords is stored and transmitted. Risks
are also posed by several means of breaching computer security which