Top Banner
24 CHAPTER 2 LITERATURE SURVEY 2.1 Neural Networks Basics: An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well. 2.1.1 Use Neural Networks: Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions. 2.1.2 Advantages of ANN: i. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience. ii. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time. iii. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.
49

CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

Mar 05, 2018

Download

Documents

duongduong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

24

CHAPTER 2

LITERATURE SURVEY

2.1 Neural Networks Basics:

An Artificial Neural Network (ANN) is an information processing

paradigm that is inspired by the way biological nervous systems, such as

the brain, process information. The key element of this paradigm is the

novel structure of the information processing system. It is composed of a

large number of highly interconnected processing elements (neurons)

working in unison to solve specific problems. ANNs, like people, learn

by example. An ANN is configured for a specific application, such as

pattern recognition or data classification, through a learning process.

Learning in biological systems involves adjustments to the synaptic

connections that exist between the neurons. This is true of ANNs as well.

2.1.1 Use Neural Networks:

Neural networks, with their remarkable ability to derive meaning

from complicated or imprecise data, can be used to extract patterns and

detect trends that are too complex to be noticed by either humans or other

computer techniques. A trained neural network can be thought of as an

"expert" in the category of information it has been given to analyze. This

expert can then be used to provide projections given new situations of

interest and answer "what if" questions.

2.1.2 Advantages of ANN:

i. Adaptive learning: An ability to learn how to do tasks based on

the data given for training or initial experience.

ii. Self-Organization: An ANN can create its own organization or

representation of the information it receives during learning time.

iii. Real Time Operation: ANN computations may be carried out in

parallel, and special hardware devices are being designed and

manufactured which take advantage of this capability.

Page 2: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

25

iv. Fault Tolerance via Redundant Information Coding: Partial

destruction of a network leads to the corresponding degradation of

performance. However, some network capabilities may be retained

even with major network damage.

2.1.3 Neural Networks versus Conventional Computers:

Neural networks take a different approach to problem solving than

that of conventional computers. Conventional computers use an

algorithmic approach i.e. the computer follows a set of instructions in

order to solve a problem. Unless the specific steps that the computer

needs to follow are known the computer cannot solve the problem. That

restricts the problem solving capability of conventional computers to

problems that we already understand and know how to solve. But

computers would be so much more useful if they could do things that we

don't exactly know how to do.

Neural networks process information in a similar way the human

brain does. The network is composed of a large number of highly

interconnected processing elements (neurons) working in parallel to solve

a specific problem. Neural networks learn by example. They cannot be

programmed to perform a specific task. The examples must be selected

carefully otherwise useful time is wasted or even worse the network

might be functioning incorrectly. The disadvantage is that because the

network finds out how to solve the problem by itself, its operation can be

unpredictable.

On the other hand, conventional computers use a cognitive

approach to problem solving; the way the problem is to solved must be

known and stated in small unambiguous instructions. These instructions

are then converted to a high level language program and then into

machine code that the computer can understand. These machines are

Page 3: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

26

totally predictable; if anything goes wrong is due to a software or

hardware fault.

Neural networks and conventional algorithmic computers are not in

competition but complement each other. There are tasks are more suited

to an algorithmic approach like arithmetic operations and tasks that are

more suited to neural networks. Even more, a large number of tasks,

require systems that use a combination of the two approaches (normally a

conventional computer is used to supervise the neural network) in order

to perform at maximum efficiency.

2.1.4 The Neuron:

The neuron is the basic building block of the neural network. A

neuron is a communication conduit that both accepts input and produces

output. The neuron receives its input either from other neurons or the user

program. Similarly, the neuron sends its output to other neurons or the

user program.

Figure 2.1.Mathematical representation of a Neuron

The commonest type of artificial neural network consists of three groups,

or layers, of units: a layer of "input" units is connected to a layer of

Page 4: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

27

"hidden" units, which is connected to a layer of "output" units. (See

Figure 1)

The activity of the input units represents the raw information

that is fed into the network.

The activity of each hidden unit is determined by the activities

of the input units and the weights on the connections between

the input and the hidden units.

The behavior of the output units depends on the activity of the

hidden units and the weights between the hidden and output

units.

This simple type of network is interesting because the hidden units

are free to construct their own representations of the input. The weights

between the input and hidden units determine when each hidden unit is

active, and so by modifying these weights, a hidden unit can choose what

it represents.

We also distinguish single-layer and multi-layer architectures. The

single-layer organization, in which all units are connected to one another,

constitutes the most general case and is of more potential computational

power than hierarchically structured multi-layer organizations. In multi-

layer networks, units are often numbered by layer, instead of following a

global numbering.

Figure 2.2. Multi layer Architecture

Page 5: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

28

2.1.5 Neuron Connection Weights:

The previous section already mentioned that neurons are usually

connected together. These connections are not equal, and can be assigned

individual weights. These weights are what give the neural network the

ability to recognize certain patterns. Adjust the weights, and the neural

network will recognize a different pattern.

Adjustment of these weights is a very important operation. Later

chapters will show you how neural networks can be trained. The process

of training is adjusting the individual weights between each of the

individual neurons until we achieve close to the desired output.

2.1.6 The Learning Process:

The memorization of patterns and the subsequent response of the network

can be categorized into two general paradigms [8]:

Associative mapping in which the network learns to produce a

particular pattern on the set of input units whenever another

particular pattern is applied on the set of input units. The

associative mapping can generally be broken down into two

mechanisms:

Auto-association: an input pattern is associated with itself and the

states of input and output units coincide. This is used to provide

pattern completion, i.e. to produce a pattern whenever a portion of

it or a distorted pattern is presented. In the second case, the

network actually stores pairs of patterns building an association

between two sets of patterns.

hetero-association: is related to two recall mechanisms:

o nearest-neighbor recall, where the output pattern produced

corresponds to the input pattern stored, which is closest to

the pattern presented, and

Page 6: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

29

o Interpolative recall, where the output pattern is a similarity

dependent interpolation of the patterns stored corresponding

to the pattern presented. Yet another paradigm, which is a

variant associative mapping, is classification, i.e. when there

is a fixed set of categories into which the input patterns are

to be classified.

Regularity detection in which units learn to respond to particular

properties of the input patterns. Whereas in associative mapping

the network stores the relationships among patterns, in regularity

detection the response of each unit has a particular 'meaning'. This

type of learning mechanism is essential for feature discovery and

knowledge representation.

Every neural network possesses knowledge which is contained in

the values of the connections weights. Modifying the knowledge stored in

the network as a function of experience implies a learning rule for

changing the values of the weights.

All learning methods used for adaptive neural networks can be classified

into two major categories:

Supervised learning which incorporates an external teacher, so

that each output unit is told what its desired response to input

signals ought to be. During the learning process global information

may be required. Paradigms of supervised learning include error-

correction learning, reinforcement learning and stochastic learning.

An important issue concerning supervised learning is the problem

of error convergence, i.e. the minimization of error between the

desired and computed unit values. The aim is to determine a set of

weights which minimizes the error. One well-known method,

which is common to many learning paradigms, is the least mean

square (LMS) convergence.

Page 7: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

30

Unsupervised learning uses no external teacher and is based upon

only local information. It is also referred to as self-organization, in

the sense that it self-organizes data presented to the network and

detects their emergent collective properties. Paradigms of

unsupervised learning are Hebbian learning and competitive

learning.A neural network learns on-line if it learns and operates at

the same time. Usually, supervised learning is performed off-line,

whereas unsupervised learning is performed on-line.

Transfer Function [8]:

The behavior of an ANN (Artificial Neural Network) depends on

both the weights and the input-output function (transfer function) that is

specified for the units. This function typically falls into one of three

categories: linear (or ramp), threshold and sigmoid.

For linear units, the output activity is proportional to the total weighted

output. For threshold a unit, the output is set at one of two levels,

depending on whether the total input is greater than or less than some

threshold value. For sigmoid units, the output varies continuously but not

linearly as the input changes. Sigmoid units bear a greater resemblance to

real neurons than do linear or threshold units, but all three must be

considered rough approximations.

2.1.7 Error Calculation [9]:

Error calculation is an important aspect of any neural network.

Whether the neural network is supervised or unsupervised, an error rate

must be calculated. The goal of virtually all training algorithms is to

minimize the error. In this section we will examine how the error is

calculated for a supervised neural network. We will also discuss how the

error is determined for an unsupervised training algorithm. We will begin

this section by discussing two error calculation steps used for supervised

training.

Page 8: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

31

Error Calculation and Supervised Training [9]:

Error calculation is an important part of the supervised training

algorithm. In this section we will examine an error calculation method

that can be employed by supervised training. For supervised training

there are two components to the error that must be considered. First, we

must calculate the error for each of the training sets as they are processed.

Secondly we must take the average across each sample for the training

set. For example, the XOR problem that has only four items in its training

set. An output error would be calculated on each element of the training

set. Finally, after all training sets have been processed, the root mean

square (RMS) error is determined.

Output Error:

The output error is simply an error calculation that is done to

determine how far off a neural network's output was from the ideal

network. This value is rarely used for any purpose other than a stepping

stone on the way to the calculation of root mean square (RMS) error.

Once all training sets have been used the RMS error can be calculated.

This error acts as the global error for the entire neural network.

2.1.8 A Feed Forward Neural Network:

A "feed forward" neural network [9] is similar to the types of neural

networks that we are ready examined. Just like many other neural

network types the feed forward neural network begins with an input layer.

This input layer must be connected to a hidden layer. This hidden

layer can then be connected to another hidden layer or directly to the

output layer. There can be any number of hidden layers so long as at least

one hidden layer is provided. In common use most neural networks will

have only one hidden layer. It is very rare for a neural network to have

more than two hidden layers. We will now examine, in detail, and the

structure of a "feed forward neural network".

Page 9: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

32

The Structure of a Feed Forward Neural Network:

A "feed forward" neural network differs from the neural networks

previously examined. Figure 2.3 shows a typical feed forward neural

network with a single hidden layer

Figure 2.3. A typical feed forward neural network with a single hidden

layer

Choosing the Network Structure:

As we saw the previous section there are many ways that feed

forward neural networks can be constructed. You must decide how many

neurons will be inside the input and output layers. You must also decide

how many hidden layers you're going to have, as well as how many

neurons will be in each of these hidden layers.

There are many techniques for choosing these parameters. In this

section we will cover some of the general "rules of thumb" that you can

use to assist you in these decisions. Rules of thumb will only take you so

Page 10: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

33

far. In nearly all cases some experimentation will be required to

determine the optimal structure for your "feed forward neural network".

The Input Layer:

The input layer to the neural network is the conduit through which

the external environment presents a pattern to the neural network. Once a

pattern is presented to the input later of the neural network the output

layer will produce another pattern. In essence this is all the neural

network does. The input layer should represent the condition for which

we are training the neural network for. Every input neuron should

represent some independent variable that has an influence over the output

of the neural network.

It is important to remember that the inputs to the neural network

are floating point numbers. These values are expressed as the primitive

Java data type "double". This is not to say that you can only process

numeric data with the neural network. If you wish to process a form of

data that is non-numeric you must develop a process that normalizes this

data to a numeric representation.

The Output Layer:

The output layer of the neural network is what actually presents a

pattern to the external environment. Whatever patter is presented by the

output layer can be directly traced back to the input layer. The number of

a output neurons should directly related to the type of work that the

neural network is to perform.

To consider the number of neurons to use in your output layer you

must consider the intended use of the neural network. If the neural

network is to be used to classify items into groups, then it is often

preferable to have one output neurons for each group that the item is to be

assigned into. If the neural network is to perform noise reduction on a

signal then it is likely that the number of input neurons will match the

Page 11: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

34

number of output neurons. In this sort of neural network you would one

day he would want the patterns to leave the neural network in the same

format as they entered.

The Number of Hidden Layers:

There are really two decisions that must be made with regards to

the hidden layers. The first is how many hidden layers to actually have in

the neural network. Secondly, you must determine how many neurons

will be in each of these layers. We will first examine how to determine

the number of hidden layers to use with the neural network.

Neural networks with two hidden layers can represent functions

with any kind of shape. There is currently no theoretical reason to use

neural networks with any more than two hidden layers. Further for many

practical problems there's no reason to use any more than one hidden

layer. Problems that require two hidden layers are rarely encountered.

Differences between the numbers of hidden layers are summarized in

Table 2.1.

Number of

Hidden

Layers

Result

None Only capable of representing linear separable

functions or decisions.

1 Can approximate arbitrarily while any

functions which contains a continuous

mapping from one finite space to another.

2 Represent an arbitrary decision boundary to

arbitrary accuracy with rational activation

functions and can approximate any smooth

mapping to any accuracy.

Table 2.1: Determining the number of hidden layers

Page 12: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

35

Just deciding the number of hidden neuron layers is only a small

part of the problem. You must also determine how many neurons will be

in each of these hidden layers. This process is covered in the next section.

The Number of Neurons in the Hidden Layers:

Deciding the number of hidden neurons in layers is a very

important part of deciding your overall neural network architecture.

Though these layers do not directly interact with the external environment

these layers have a tremendous influence on the final output. Both the

number of hidden layers and number of neurons in each of these hidden

layers must be considered.

Using too few neurons in the hidden layers will result in something

called under fitting. Under fitting occurs when there are too few neurons

in the hidden layers to adequately detect the signals in a complicated data

set. Using too many neurons in the hidden layers can result in several

problems. First too many neurons in the hidden layers may result in over

fitting. Over fitting occurs when the neural network has so much

information processing capacity that the limited amount of information

contained in the training set is not enough to train all of the neurons in the

hidden layers [9].

A second problem can occur even when there is sufficient training

data. An inordinately large number of neurons in the hidden layers can

increase the time it takes to train the network. The amount of training

time can increase enough so that it is impossible to adequately train the

neural network. Obviously some compromise must be reached between

too many and too few look neurons in the hidden layers.There are many

rule-of-thumb methods for determining the correct number of neurons to

use in the hidden layers.

Page 13: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

36

Some of them are summarized as follows.

• The number of hidden neurons should be in the range between the size

of the input layer and the size of the output layer.

• The number of hidden neurons should be 2/3 of the input layer size,

plus the size of the output layer.

• The number of hidden neurons should be less than twice the input

layer size.

These three rules are only starting points that you may want to

consider. Ultimately the selection of the architecture of your neural

network will come down to trial and error. But what exactly is meant by

trial and error. You do not want to start throwing random layers and

numbers of neurons at your network. To do so would be very time-

consuming. There are two methods they can be used to organize your trial

and error search for the optimum network architecture.

There are two trial and error approaches that you may use in

determining the number of hidden neurons are the "forward" and

"backward" selection methods. The first method, the "forward selection

method", begins by selecting a small number of hidden neurons. This

method usually begins with only two hidden neurons. Then the neural

network is trained and tested. The number of hidden neurons is then

increased and the process is repeated so long as the overall results of the

training and testing improved. The "forward selection method" is

summarized in the figure 2.4.

2.1.9 Applications of NN:

Prediction: learning from past experience

o pick the best stocks in the market

o predict weather

o identify people with cancer risk

Page 14: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

37

Classification

o Image processing

o Predict bankruptcy for credit card companies

o Risk assessment

Figure 2.4. Selecting the number of hidden neurons with forward

selection

Recognition

o Pattern recognition: SNOOPE (bomb detector in U.S.

airports)

o Character recognition

o Handwriting: processing checks

Page 15: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

38

Data association

Not only identify the characters that were scanned but identify

when the scanner is not working properly

Data Filtering

o e.g. take the noise out of a telephone signal, signal

smoothing

Planning

o Unknown environments

o Sensor data is noisy

o Fairly new approach to planning

Advantages:

Adapt to unknown situations

Robustness: fault tolerance due to network redundancy

Autonomous learning and generalization

Disadvantages

Not exact

Large complexity of the network structure

2.1.10 Problems not suited to a Neural Network:

Programs that are easily written out as a flowchart are an example

of programs that are not well suited to neural networks. If your program

consists of well defined steps, normal programming techniques will

suffice [9].Another criterion to consider is whether the logic of your

program is likely to change. The ability for a neural network to learn is

one of the primary features of the neural network. If the algorithm used to

solve your problem is an unchanging business rule there is no reason to

use a neural network. It might be detrimental to your program if the

neural network attempts to find a better solution, and begins to diverge

from the expected output of the program.

Page 16: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

39

Finally, neural networks are often not suitable for problems where

you must know exactly how the solution was derived. A neural network

can become very useful for solving the problem for which the neural

network was trained. But the neural network cannot explain its reasoning.

The neural network knows because it was trained to know. The neural

network cannot explain how it followed a series of steps to derive the

answer.

2.1.11 Problems Suited to a Neural Network:

Although there are many problems that neural networks are not

suited for there are also many problems that a neural network is quite

useful for solving. In addition, neural networks can often solve problems

with fewer lines of code than a traditional programming algorithm. It is

important to understand what these problems are. Neural networks are

particularly useful for solving problems that cannot be expressed as a

series of steps, such as recognizing patterns, classifying into groups,

series prediction and data mining.

2.1.12 Validating Neural Networks:

Once a neural network has been trained it must be evaluated to see

if it is ready for actual use. This final step is important so that it can be

determined if additional training is required. To correctly validate a

neural network, validation data must be set aside that is completely

separate from the training data.

As an example, consider a classification network that must group

elements into three different classification groups. You are provided with

10,000 sample elements. For this sample data the group that each element

should be classified into is known. For such a system you would divide

the sample data into two groups of 5,000 elements. The first group would

form the training set. Once the network was properly trained the second

group of 5,000 elements would be used to validate the neural network.

Page 17: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

40

It is very important that a separate group always be maintained for

validation. First training a neural network with a given sample set and

also using this same set to predict the anticipated error of the neural

network a new arbitrary set, will surely lead to bad results. The error

achieved using the training set will almost always be substantially lower

than the error on a new set of sample data. The integrity of the validation

data must always be maintained.

This brings up an important question. What exactly does happen if

the neural network that you have just finished training performs poorly on

the validation set? If this is the case, then you must examine what,

exactly, this means. It could mean that the initial random weights were

not good. Rerunning the training with new initial weights could correct

this. While an improper set of initial random weights could be the cause,

a more likely possibility is that the training data was not properly chosen.

If the validation is performing badly this most likely means that

there was data present in the validation set that was not available in the

training data. The way that this situation should be solved is by trying a

different, more random, way of separating the data into training and

validation sets. If this fails, you must combine the training and validation

sets into one large training set. Then new data must be acquired to serve

as the validation data [9].

For some situations it may be impossible to gather additional data

to use as either training or validation data. If this is the case then you are

left with no other choice but to combine all or part of the validation set

with the training set. While this approach will forgo the security of a good

validation, if additional data cannot be acquired this may be your only

alternative.

Page 18: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

41

2.2 Introduction to Associative memories:

The associative memory models[10], an early class of neural

models that fit perfectly well with the vision of cognition emergent

from today brain neuro-imaging techniques, are inspired on the

capacity of human cognition to build calculus makes them a possible

link between connectionist models and classical artificial intelligence

developments.

Our memories function as an associative or content - addressable.

That is, a memory does not exist in some isolated fashion, located in a

particular set of neurons. Thus memories are stored in association with

one another. These different sensory units lie in completely separate

parts of the brain, so it is clear that the memory of the person must

be distributed throughout the brain in some fashion. We access the

memory by its contents not by where it is stored in the neural

pathways of the brain. This is very powerful; given even a poor

photograph of that person we are quite good at reconstructing the persons

face quite accurately. This is very different from a traditional

computer where specific facts are located in specific places in

computer memory. If only partial information is available about this

location, the fact or memory cannot be recalled at all.

Traditional measures of associative memory performance are

its memory capacity and content-addressability. Memory capacity

refers to the maximum number of associated pattern pairs that can be

stored and correctly retrieved while content-addressability is the ability

of the network to retrieve the correct stored pattern. Obviously, the

two performance measures are related to each other. It is known that

using Hebb's learning rule in building the connection weight matrix

of an associative memory yields a significantly low memory

capacity. Due to the limitation brought about by using Hebb's

Page 19: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

42

learning rule, several Modifications and variations are proposed to

maximize the memory capacity [11].

2.2.1 Learning:

Learning is the way we acquire knowledge about the world around

us, and it is through this process of knowledge acquisition, that the

environment alerts our behavioral responses. Learning allows us to

store and retain knowledge; it builds our memories.

Aristotle stated about memory: first, the elementary unit of

memory is a sense image and second, association and links

between elementary memories serve as the basis for higher level

cognition. Memory stands for the elementary unit and association for

recollection between elementary units [11]. In a neurobiological

context, memory refers to the relatively enduring neural alterations

induced by the interaction of an organism with its environment.

Without such a change, there is no memory. The memory must be useful

and accessible to the nerves system that influences the future behavior.

Memory and Learning are intricately connected. When a particular

activity pattern is learned, it is stored in the brain where it can be

recalled later when required. Learning encodes information. A system

learns a pattern if the system encodes the pattern in its structure. The

system structure changes as the system learns the information. So,

learning involves change.

That change can be represented in memory for future behavior.

Over the past century the psychologists have studied learning based

on fundamental paradigms: non-associative and associative. In non-

associative learning an organism acquires the properties of a single

repetitive stimulus. In associative learning [Edward Thorndike, B.F.

Skinner], an organism acquires knowledge about the relationship of

Page 20: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

43

either one stimulus to another, or one stimulus to the organisms

own Behavioral response to that stimulus.

On the neuronal basis of formation of memories into two

distinct categories: STM (short term memory) and LTM (long term

memory). Inputs to the brain are processed into STM‘s which last

at the most for a few minutes. Information is downloaded into LTM‘s

for more permanent storage. One of the most important functions of

our brain is the laying down and recall of memories. It is difficult to

imagine how we could function without both short and long term

memory. The absence of short term memory would render most

tasks extremely difficult if not impossible - life would be punctuated by

a series of one time images with no logical connection between

them. Equally, the absence of any means of long term memory would

ensure that we could not learn by past experience.

The acquisition of knowledge is an active, ongoing cognitive

process based on our perceptions. An important point about the learning

mechanism is that it distributes the memory over different areas, making

them robust to damage. Distributed storage permits the brain to work

easily from partially corrupted information [11]. .

2.2.2 Associative Memory Model:

Associative memory maps [10, 12] data from an input space to data

in an output space. In general, this mapping is from unknown

domain points to known range points, where the memory learns an

underlying association from a training data set. For non-learning

memory models, which have their origin in additive neuronal

dynamics, connection strength‘s are ―programmed‖ a priori

depending upon the association that are to be encoded in the system.

Sometimes these memories are referred to as matrix associative

memories, because a connection matrix W, encodes associations

Page 21: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

44

where is one of the programmed memories

then is called the association of . When are in

different spaces then the model is hetero- associative memory. i.e. it

associates two different vectors with one another. If , then the

model is Auto-associative memory. i.e., it associates a vector with itself.

Associative memory models enjoy properties such as fault tolerance.

Types of Associative Neural Memories:

Associative neural memories are concerned with associative

learning and retrieval of information (vector patterns) in neural

networks. These networks represent one of the most extensively

analyzed classes of artificial neural networks. Several associative

neural memory models have been proposed over the last two decades.

These memory models can be classified into various ways depending on

Architecture (Static versus Dynamic)

Retrieval Mode (Synchronous versus Asynchronous)

Nature of stored association (Auto-associative versus Hetero-

associative)

Complexity and capability of memory storage

Simple Associative memories are static and very low memory so that

they cannot be applied in the applications where high memory is required

[11]. Several modes can also be used to update the states of the

units in both layers namely synchronous, asynchronous, and a

combination of the two. In synchronous updating scheme, the states

of the units in a layer are updated as a group prior to propagating the

output to the other layer. In asynchronous updating, units in both

layers are updated in some order and output is propagated to the

other layer after each unit update. Lastly, in synchronous-

asynchronous updating, there can be subgroups of units in each

Page 22: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

45

layer that are updated synchronously while units in each subgroup are

updated asynchronously.

Dynamic Associative memories such as Hopfield, Bi Directional

Associative memory (BAM), Brain in State Box(BSB) are Dynamical

memories but they also capable of supporting very low memory, so they

cannot be applied in the applications where high memory is required,

because of this reason we chosen Context Sensitive auto-associative

memory model for developing the expert system and also this can be

compared with some of machine learning algorithms such as Back

propagation, Bayesian Networks, C4.5 and Particle Swarm Optimization.

Dynamic Associative memories such as Hopfield, BSB, and

BAM are Dynamical memories but they are also capable of supporting

very low memory, so they cannot be applied in the applications where

high memory requirements are there.

A simple model describing context-dependent associative

memories generates a good vectorial representation of basic logical

calculus. One of the powers of this vectorial representation is the

very natural way in which binary matrix operators are capable to

compute ambiguous situations. This fact presents a biological

interest because of the very natural way in which the human mind is

able to take decisions in the presence of uncertainties. Also these

memories could be used to develop expert agents to the recent problem

domain. Holographic memories are being used to build the many

advanced memory based agents like memory cards, USB Drives,

etc., [11]. The advantage of using recurrent networks as associative

memory is their convergence to one of a finite number of stable states

when started at some initial state. The basic goals are:

• To be able to store as many exemplars as we need, each

corresponding to a different stable state of the network,

Page 23: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

46

• To have no other stable state

• To have the stable state that the network converges to be the one

closest

• to the applied pattern

The problems that we are faced with Associative memories:

• The capacity of the network is restricted,

• Depending on the number and properties of the patterns to be

stored,

• some of the exemplar may not be the stable states,

• Some spurious stable states different than the exemplars may arise

by themselves

• The converged stable state may be other than the one closest to the

applied pattern

2.3. Related work:

Not surprisingly, researchers have also tried to use neural networks

in Cryptography. A recent survey of the literature indicates that there has

been an increasing interest in the application of different classes of neural

networks to problems related to cryptography in the past few years.

Recent works have examined the use of neural networks in

cryptosystems. Typical examples include key management, generation

and exchange protocols; visual cryptography; pseudo random generators;

digital watermarking; and steganalysis [13].

2.3.1 Interacting neural network and cryptography:

The goal of any cryptographic system is the exchange of

information among the intended users without any leakage of information

to others who may have unauthorized access to it. A common secret key

could be created over a public channel accessible to any opponent. Neural

networks can be used to generate common secret key.

Page 24: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

47

In case of neural cryptography, both the communicating

networks receive an identical input vector, generate an output bit and

are trained based on the output bit. The two networks and their weight

vectors exhibit a novel phenomenon, where the networks synchronize to a

state with identical time-dependent weights. The generated secret key

over a public channel is used for encrypting and decrypting the

information being sent on the channel [14]

Based on chaotic neural networks, a Hash function can be

constructed, which makes use of neural networks' diffusion property

and chaos' confusion property. This function encodes the plaintext of

arbitrary length into the hash value of fixed length (typically, 128-bit,

256-bit or 512-bit). Theoretical analysis and experimental results show

that this hash function is one-way, with high key sensitivity and

plaintext sensitivity, and secure against birthday attacks or meet-in-the-

middle attacks. These properties make it a suitable choice for data

signature or authentication [15].

Neural cryptography deals with the problem of key exchange using

the mutual learning concept between two neural networks. The two

networks will exchange their outputs (in bits) so that the key between the

two communicating parties is eventually represented in the final learned

weights and the two networks are said to be synchronized. Security of

neural synchronization depends on the probability that an attacker can

synchronize with any of the two parties during the training process, so

decreasing this probability improves the reliability of exchanging their

output bits through a public channel [16].Artificial neural networks are

used to classify functional blocks from a disassembled program as being

either cryptography related or not. The resulting system, referred to as

NNLC (Neural Net for Locating Cryptography) [17].

Page 25: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

48

When training a neural network it is tempting to experiment with

architectures until a low total error is achieved. The danger in doing so is

the creation of a network that loses generality by over-learning the

training data; lower total error does not necessarily translate into a low

total error in validation. The resulting network may keenly detect the

samples used to train it, without being able to detect subtle variations in

new data. A method is presented for choosing the best neural network

architecture for a given data set based on observation of its accuracy,

precision, and mean square error [18].

The method, based on, relies on k-fold cross validation to evaluate

each network architecture k times to improve the reliability of the choice

of the optimal architecture. The need for four separate divisions of the

data set is demonstrated (testing, training, and validation, as normal, and

a comparison set). Instead of measuring simply the total error the

resulting discrete measures of accuracy, precision, false positive, and

false negative are used. This method is then applied to the problem of

locating cryptographic algorithms in compiled object code for two

different CPU architectures to demonstrate the suitability of the method.

2.4 Passwords:

Basics of Passwords: Passwords are at present the most common method

for verifying the identity of a user. This is a flawed method; systems

continue to use passwords because of their ease of use and ease of

implementation. Among many problems are the successful guessing of

user‘s passwords, and the intercepting of them or uncovering them online.

To prevent guessing and for additional security, the National Security

Agency (NSA) recommends using a random 8-letter password that is

regularly changed [19].

Since such a stream of passwords is almost impossible to

remember (certainly for me), the hapless user is forced to write these

Page 26: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

49

passwords down, adding to the insecurity. Thus passwords need to be

protected by cryptographic techniques, whether they are stored or

transmitted. Several simple techniques can help make the old-fashioned

form of passwords easier to memorize. First, the system can present a

user with a list of possible random passwords from which to choose. With

such a choice, there may be one password that is easier for a given user to

remember. Second, the most common passwords are limited to 8

characters, and experience has shown that users have a hard time picking

such a short password that turns out to be secure.

If the system allows passwords of arbitrary length (fairly common

now), then users can employ pass phrases: a phrase or sentence that is not

going to be in dictionaries yet is easy for the given user to remember. My

favorite pass phrase is ―Dexter‘s mother‘s bread,‖ but I won‘t be able to

use it any more. Personal physical characteristics form the basis for a

number of identification methods now in use. The characteristics or

biometrics range from fingerprints to iris patterns, from voice to hand

geometry, among many examples. These techniques are outside the scope

of this book. The remaining two sections study two uses of one-way

functions to help secure passwords. A simple system password scheme

would just have a secret file holding each user‘s account name and the

corresponding password. There are several problems with this method: if

someone manages to read this file, they can immediately pretend to be

any of the users listed. Also, someone might find out about a user‘s likely

passwords from passwords used in the past.

For the reasons above and others, early UNIX systems protected

passwords with a one-way function (described in an earlier chapter).

Along with the account name, the one-way function applied to the

password is stored. Thus given a user A, with account name NA and

password PA, and given a fixed one-way function h, the system would

Page 27: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

50

store NA and h (PA) as a table entry in the password file, with similar

entries for other users. When A supplies her password to the system, the

software computes h of her password and compares this result with the

table entry. In this way the systems administrators themselves will not

know the passwords of users and will not be able to impersonate a user.

In early UNIX systems it was a matter of pride to make the

password file world readable. A user would try to guess other‘s

passwords by trying a guess P: first calculate h (P) and then compare this

with all table entries. There were many values of P to try, such as entries

in a dictionary, common names, special entries that are often used as

passwords, all short passwords, and all the above possibilities with

special characters at the beginning or the end. These ―cracker‖ programs

have matured to the point where they can always find at least some

passwords if there are quite a few users in the system. Now the password

file is no longer public, but someone with root privileges can still get to it,

and it sometimes leaks out in other ways.

To make the attack in the previous paragraph harder (that attack is

essentially the same as cipher text searching), systems can first choose h

the one-way function to be more execution time intensive. This only

slows down the searches by a linear factor. Another approach uses an

additional random table entry, called a salt. Suppose for example that

each password table entry has another random t-bit field (the salt),

different for each password. When Alice first puts her password into the

system (or changes it), she supplies PA. The system chooses the salt and

calculates EA = h (PA, SA), where h is fixed up to handle two inputs

instead of one.

The password file entry for Alice now contains A, SA, and EA.

With this change, an attack on a single user is the same, but the attack of

the previous paragraph on all users at the same time now takes either an

Page 28: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

51

extra factor of time equal to either 2t or the number of users, whichever is

smaller. Without the salt, an attacker could check if ―Dexter‖ were the

password of any user by calculating h (“Dexter”) and doing a fast search

of the password file for this entry. With the salt, to check if Alice is using

―Dexter‖ for example, the attacker must retrieve Alice‘s salt SA and

calculate h (“Dexter”, SA). Each user requires a different calculation, so

this simple device greatly slows down the dictionary attack.

Text Password:

Password strength is a measure of the effectiveness of

a password in resisting guessing and brute-force attacks. In its usual form,

it estimates how many trials an attacker who does not have direct access

to the password would need, on average, to guess it correctly. The

strength of a password is a function of length, complexity, and

unpredictability [20]

However, other attacks on passwords can succeed without a brute

search of every possible password. For instance, knowledge about a user

may suggest possible passwords (such as pet names, children's names,

etc.). Hence estimates of password strength must also take into account

resistance to other attacks as well. Using strong passwords lowers

overall risk of a security breach, but strong passwords do not replace the

need for other effective security controls. The effectiveness of a password

of a given strength is strongly determined by the design and

implementation of the authentication system software, particularly how

frequently password guesses can be tested by an attacker and how

securely information on user passwords is stored and transmitted. Risks

are also posed by several means of breaching computer security which

are unrelated to password strength.

Page 29: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

52

Determining Password Strength:

There are two factors to consider in determining password strength:

the ease with which an attacker can check the validity of a guessed

password, and the average number of guesses the attacker must make to

find the correct password. The first factor determined by how the

password is stored and what it is used for, while the second factor is

determined by how long the password is, what set of symbols it is drawn

from and how it is created.

Password Guess Validation:

The most obvious way to test a guessed password is to attempt to

use it to access the resource the password was meant to protect. However,

this can be slow and many systems will delay or block access to an

account after several wrong passwords are entered. On the other hand,

systems that use passwords for authentication must store them in some

form to check against entered values. Usually only a cryptographic of a

password is stored instead of the password itself. If the hash is strong

enough, it is very hard to reverse it, so an attacker that gets hold of the

hash value cannot directly recover the password. However, if the

cryptographic hash data files have been stolen, knowledge of the hash

value lets the attacker quickly test guesses.

Password Creation:

Passwords are created either automatically (using randomizing

equipment) or by a human. The strength of randomly chosen passwords

against a brute force attack can be calculated with precision. Commonly,

passwords are initially created by asking a human to choose a password,

sometimes guided by suggestions or restricted by a set of rules. This

typically happens at the time of account creation for computer systems or

Internet Web sites. In this case, only estimates of strength are possible,

Page 30: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

53

since humans tend to follow patterns in such tasks, and those patterns

may assist an attacker [21].

Password strength depends on symbol set and length:

Increasing the number of possible symbols from which random

passwords are chosen will increase the strength of generated passwords of

any given length. For example, the printable characters in the American

Standard Code for Information Interchange (ASCII) character set

(roughly those on a standard U.S. English keyboard) include 26 letters (in

two case variants), 10 digits, and 33 non-alphanumeric symbols (i.e.,

punctuation, grouping, etc.), for a total of 94 symbols (95 if space is

included). However the same strength can always be achieved with a

smaller symbol set by choosing a longer password. In the extreme, binary

passwords can be very secure, even though they only use two possible

symbols. Thus a 14 character password consisting of only random

lowercase letters has the same strength (4.7×14 = 65.8 bits) as a ten

character password chosen at random from all printable ASCII characters

(65.55 bits).

Guide Lines for Passwords:

Common guidelines for choosing good passwords are designed to

make passwords less easily discovered by intelligent guessing [16-19]:

Password length should be around 12 to 14 characters if permitted,

and longer still if possible while remaining memorable

Use randomly generated passwords where feasible

Avoid any password based on repetition, dictionary words, letter or

number sequences, usernames, relative or pet names, romantic

links (current or past), or biographical information (e.g., ID

numbers, ancestors names or dates).

Include numbers, and symbols in passwords if allowed by the

system

Page 31: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

54

If the system recognizes case as significant, use capital and lower-

case letters

Avoid using the same password for multiple sites or purposes

If you write your passwords down, keep the list in a safe place,

such as a wallet or safe, not attached to a monitor or in an unlocked

desk drawer

Protecting passwords:

Computer users are generally advised to "never write down a

password anywhere, no matter what" and "never use the same password

for more than one account." However, an ordinary computer user may

have dozens of password-protected accounts. Users with multiple

accounts needing passwords often give up and use the same password for

every account. When varied password complexity requirements prevent

use of the same (memorable) scheme for producing high-strength

passwords, overly simplified passwords will often be created to satisfy

irritating and conflicting password requirements. A Microsoft expert was

quoted as saying at a 2005 security conference: "I claim that password

policy should say you should write down your password. I have 68

different passwords. If I am not allowed to write any of them down, guess

what I am going to do? I am going to use the same password on every one

of them [20].

Limitations of alphanumeric passwords:

The main problem with the alphanumeric passwords is that once a

password has been chosen and learned the user must be able to recall

it to log in. But, people regularly forget their passwords. If a password

is not frequently used it will be even more susceptible to forgetting.

The recent surveys have shown that users select short, simple

passwords that are easily guessable, for example, personal names of

their family members, names of pets, date of birth etc [25].the most

Page 32: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

55

important issue is having a password that can be remembered reliably

and input quickly. They are unlikely to give priority to security over their

need to get on with their work.

Graphical Passwords:

Graphical password were originally described by

Blonder(1996).the basic need for graphical password is that graphical

passwords are expected to be easier to recall, less likely to be

written down and have the potential to provide a richer symbol

space than text based password. For example, a user might authenticate

by clicking a series of points on an image, selecting a series of tiles, or by

drawing lines on the screen [26]. Because human beings live and interact

in an environment where the sense of sight is predominant for most

activities, our brains are capable of processing and storing large amounts

of graphical information with ease. While we may find it very hard to

remember a string of fifty characters, we are able easily to remember

faces of people, places we visited, and things we have seen. These

graphical data represent millions of bytes of information and thus provide

large password spaces. Thus, graphical password schemes provide a way

of making more human-friendly passwords while increasing the level of

security.

Disadvantages of Graphical Passwords:

Dictionary attacks are infeasible, partly because of the large

password space, but mainly because there are no pre-existing searchable

dictionaries for graphical information. It is also difficult to devise

automated attacks. Whereas we can recognize a person's face in less than

a second, computers spend a considerable amount of time processing

millions of bytes of information regardless of whether the image is a face,

a landscape, or a meaningless shape.

Page 33: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

56

Graphical password schemes have been proposed as a possible

alternative to text-based schemes, motivated partially by the fact that

humans can remember pictures better than text; psychological studies

supports such assumption [27]. Pictures are generally easier to be

remembered or recognized than text. In addition, if the number of

possible pictures is sufficiently large, the possible password space of a

graphical password scheme may exceed that of text- based schemes and

thus presumably offer better resistance to dictionary attacks. Because of

these (presumed) advantages, there is a growing interest in graphical

password. In addition to workstation and web log-in applications,

graphical passwords have also been applied to ATM machines and

mobile devices. A comprehensive survey of the existing graphical

password techniques has been conducted. We will discuss the strengths

and limitations of each method and also point out future research

directions in this area. In conducting this survey, we want to answer the

following questions: are graphical passwords as secure as text

passwords?, what is the major design and implementation issues for

graphical passwords?

2.5. Traditional Password Authentication:

In current web-based login protocols, a person logs in to a service

provider by sending his user identity and password to the server in

question, who then looks up the corresponding record in its database, and

performs a comparison to determine whether the password is valid. The

password is typically not stored in plaintext, but rather, a ―salted one-way

function‖ of the password is stored. This means that if somebody gains

access to the database of the service provider, they will not be able to

obtain plaintext passwords. However, the password itself is generally sent

prior to have the salted one-way function applied. In order to protect the

session against an eavesdropper, it is common to encrypt the transmission

Page 34: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

57

of the password. However, passwords are only used in situations where

the two communicating machines do not store any prior cryptographic

key – if they did, then passwords would be an inferior alternative to

standard cryptographic authentication mechanisms such as digital

signatures (such as RSA) or message authentication codes (such as Hash-

based Message Authentication Code (HMAC)).

2.5.1 Vulnerabilities of current password authentication practices:

There are many potential attacks that can be mounted on the

common password authentication method. To begin with, one should

notice that the above described method does not offer any protection

against an attack in which an attacker claims to be a service provider, and

convinces a user to attempt to log in – clearly, if there is no encryption,

then the attacker will simply obtain the password of his victim. The same

holds if encryption is used, but the attacker sends his a public key to

which it knows the corresponding secret key to the user instead of that of

the bank‘s. This may occur even if certificates are employed [28]; In fact,

in many, if not most, scenarios, average users are not capable of

distinguishing authentic from illegitimate certificates.

Recently, this type of attack has become a very common, and is

used by attackers wanting to perform identity theft, also referred to as

phishing. Its popularity can be seen by noting with the now daily

examples of so-called phishers trying to harvest passwords by sending out

emails that appear to originate from a bank. Even though the phishers‘

success rate is relatively low, this is a profitable attack, as evidenced by

how common it is. This is due to the ease with which attackers can spam

large populations at a negligible cost, and the straightforwardness of

spoofing emails. There are indications that the problem may become

worse as attacker become more sophisticated [30].

Page 35: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

58

2.5.2 Password Authenticated Protocols:

User authentication could be defined as a process in which one

party is assured of the identity of a second party involved in the protocol.

It is generally accomplished by one or more of the following [30]:

a) Something known. Examples include standard passwords, Personal

Identification Numbers (PINs), and the secret or private keys

whose knowledge is used in challenge-response protocols.

b) Something possessed. This is normally a physical accessory like,

magnetic striped cards, chip cards and hand-held customized

calculators (password generators) which provide time-variant

passwords.

c) Something inherent (to a human individual). This category includes

methods which make use of physical characteristics and actions of

human beings(biometrics), such as handwritten signatures,

fingerprints, voice, retinal patterns, hand geometries, and dynamic

keyboarding characteristics.

The least expensive and the most convenient solutions for user

authentication have been based on the first category. But authentication

without key exchange would not help much. These two topics need to be

considered jointly rather than separately. As pointed out in [31], a

protocol providing authentication without 33key exchange is susceptible

to an enemy who would wait until the authentication is complete and then

takes over one end of the communications line. Same is the case with key

exchange that is independent of authentication. So it is quite important to

make sure that the key exchanged is in fact shared with the intended party

and not an adversary.

In [32], the authors of Symbol Native Application Programming

Interface (SNAPI) classify user authentication schemes into those that

require persistent data to be stored on the user‘s system and those that do

Page 36: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

59

not. The former category includes schemes similar to Secure Shell (SSH),

where persistent participant specific information is stored on the client‘s

system. As mentioned above these schemes require extra security

assumptions. The second category includes password based protocols like

the popular (Encrypted key exchange) EKE family protocols. These were

later followed by Augmented-EKE (A-EKE), Modified-EKE (M-EKE),

Simple Password EKE (SPEKE), Diffe-Hellman EKE (DH-EKE), Secure

Remote Password protocol (SRP) and so on.

In recent years several password-only protocols have been proposed and

the reason for growing importance is they are based on direct trust

between a user and a server, and do not require the user to store long

secrets or data on the user‘s system. These protocols can be used not only

for user authentication with the server but also for mutual authentication

between any two users. Below we summarize the characteristics of

password-based key establishment protocols:

1. The passwords selected by users usually belong to a small dictionary

and have small entropy which makes it possible for the adversary to

search through all possible passwords in a reasonable time.

2. On-line dictionary attacks should not be possible. This means that

the adversary should not be able to partition the dictionary into valid

and invalid passwords by just gathering information during a valid

exchange.

3. On-line dictionary attacks should not be feasible. These attacks can

be easily detected, and thwarted, by counting access failures.

4. Should provide for mutual authentication.

2.5.3 Attacks against Passwords:

Many systems break because they rely on user-generated

passwords. Left to themselves, people don't choose strong passwords. If

they're forced to use strong passwords, they can't remember them. If the

Page 37: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

60

password becomes a key, it's usually much easier and faster to guess the

password than it is to brute-force the key; we've seen elaborate security

systems fail in this way. Some user interfaces make the problem even

worse: limiting the passwords to eight characters, converting everything

to lower case, etc. Even passphrases can be weak: searching through 40-

character phrases is often much easier than searching through 64-bit

random keys. We've also seen key-recovery systems that circumvent

strong session keys by using weak passwords for key-recovery.

2.5.4 Alternatives to Password Authentication:

The numerous ways in which permanent or semi-permanent

passwords can be compromised has prompted the development of other

techniques. Unfortunately, some are inadequate in practice, and in any

case few have become universally available for users seeking a more

secure alternative.

Single-use passwords are only valid once makes many potential attacks

ineffective. Most users find single use passwords extremely inconvenient.

They have, however, been widely implemented in personal online

banking, where they are known as Transaction Authentication

Numbers (TANs). As most home users only perform a small number of

transactions each week, the single use issue has not led to intolerable

customer dissatisfaction in this case.

Time-synchronized one-time passwords are similar in some ways to

single-use passwords, but the value to be entered is displayed on a small

(generally pocketable) item and changes every minute or so.

Pass Window one-time passwords are used as single-use passwords, but

the dynamic characters to be entered are visible only when a user

superimposes a unique printed visual key over a server generated

challenge image shown on the user's screen.

Page 38: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

61

Access controls based on public key cryptography e.g. SSH. The

necessary keys are usually too large to memorize and must be stored on a

local computer, security token or portable memory device, such as a USB

flash drive or even floppy disk.

Biometric methods promise authentication based on unalterable personal

characteristics, but currently (2008) have high error rates and require

additional hardware to scan, for example, fingerprints, irises, etc. They

have proven easy to spoof in some famous incidents testing commercially

available systems, for example, the gummie fingerprint spoof

demonstration,[33] and, because these characteristics are unalterable, they

cannot be changed if compromised; this is a highly important

consideration in access control as a compromised access token is

necessarily insecure.

Single sign-on technology is claimed to eliminate the need for having

multiple passwords. Such schemes do not relieve user and administrators

from choosing reasonable single passwords, nor system designers or

administrators from ensuring that private access control information

passed among systems enabling single sign-on is secure against attack.

As yet, no satisfactory standard has been developed.

Evaluating technology is a password-free way to secure data on e.g.

removable storage devices such as USB flash drives. Instead of user

passwords, access control is based on the user's access to a network

resource.

Non-text-based passwords, such as graphical passwords or mouse-

movement based passwords.[34] Graphical passwords are an alternative

means of authentication for log-in intended to be used in place of

conventional password; they use images, graphics or colors instead

of letters, digits or special characters. One system requires users to select

a series of faces as a password, utilizing the human brain's ability to recall

Page 39: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

62

faces easily.[35] In some implementations the user is required to pick

from a series of images in the correct sequence in order to gain

access.[36] Another graphical password solution creates a one-time

password using a randomly-generated grid of images. Each time the user

is required to authenticate, they look for the images that fit their pre-

chosen categories and enter the randomly-generated alphanumeric

character that appears in the image to form the one-time password. [37,

38] .So far, graphical passwords are promising, but are not widely used.

Studies on this subject have been made to determine its usability in the

real world. While some believe that graphical passwords would be harder

to crack, others suggest that people will be just as likely to pick common

images or sequences as they are to pick common passwords.

2D Key (Two-Dimensional Key) is a 2D matrix-like key input method

having the key styles of multiline passphrase, crossword, ASCII/Unicode

art, with optional textual semantic noises, to create big password/key

beyond 128 bits to realize the MePKC (Memorizable Public-Key

Cryptography) using fully memorizable private key upon the current

private key management technologies like encrypted private key, split

private key, and roaming private key.

Cognitive passwords use question and answer cue/response pairs to verify

identity.

2.5.5 Traditional authentication schemes and their disadvantages:

Normally system assigns usernames and passwords to each and every

authorized user. In order to check whether a user is authorized or not each

server stores all these usernames and passwords in a table. Whenever a

user wants to get a service from the server he uses his username and

password ,then server uses information stored in the password table to

check whether the user is authorized or not.

Page 40: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

63

USER NAME PASSWORD

Harsh 25-may1991

Vamsi Vss123

Suresh Abecke

Aditya Letmein

Sanjay 24dk03k

Table 2.2. Password table

In order to enhance the security of the system proposed work certain

encryption algorithms on the passwords and the password in the password

table will be in the encrypted format.

USER NAME ENCRYPTED PASSWORD

Harsha ↑ǨNʒ╚⌡

Vamsi kJɊ 26v

Suresh D₧£ȣ

Aditya ¥~Ϳ� ¥2p

Sanjay efeaeolg

Table 2.3. Verification table

There are many limitations in using this type of approach. Attacker

can easily change the details of users by using attacks like SQL-Injection

and the password table occupies lot of memory in the server.

2.6 Probabilistic Approach Feed Forward Neural Network:

In feed forward network the output of neurons (unit) in one layer

will be passed as input to the next layer and this process continues until

the output layer units gets an input from previous layers. Finally these

output units yield an output. The output of network depends on Input,

Page 41: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

64

Connection strengths (Weight values), and Output function used in each

layer. If we modify any of the above the output of the network will be

changed. By taking this fact as an advantage we can perform encryption

so that no attacker can decrypt it easily.

Here if „P‟ is a row matrix representing input and „W‟ is a matrix

representing weights of the network then a feed forward network

produces cipher text in the following way.

(2.1)

Plain Text Cipher Text

Figure 2.5. Feed forward Network

2.6.1 Weight Matrix Calculation:

In order to generate unique Cipher text for given plain text we have to

designate weight matrix whose RANK is equal to length of the plain text.

So here Determinant of (DET) Weight matrix should not be Zero.

i.e. |Weight Matrix

|≠0 (2.2)

Page 42: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

65

A Weight Matrix

, ( 2.3)

The weight matrix can be defined in C# as follows:

Matrix weightMatrix = new Matrix (3,2);

The threshold variable is not multidimensional, like the weight matrix.

There is one threshold value per neuron. Each neuron in the second layer

has an individual threshold value. These values can be stored in an array

of C# double variables. The following code shows how the entire

memory of the two layers can be defined.

Matrix weightMatrix = new Matrix (3,2);

double [] thresholds = new double[2];

These declarations include both the 3x2 matrix and the two threshold

values for the second layer. There is no need to store threshold values for

the first layer, since it is not connected to another layer. Weight matrix

and threshold values are only stored for the connections between two

layers, not for each layer. The preferred method for storing these values

is to combine the thresholds with the weights in a combined matrix. The

above matrix has three rows and two columns. The thresholds can be

thought of as the fourth row of the weight matrix, which can be defined

as follows:

Matrix weightMatrix = new Matrix(4,2);

The combined threshold and weight matrix is described in equation

2.4. In this equation, the variable w represents the cells used to store

weights and the variable t represents the cells used to hold thresholds.

Page 43: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

66

A Threshold and Weight Matrix

, (2.4)

Combining the thresholds and weights in one matrix has several

advantages. This matrix now represents the entire memory of this layer of

the neural network and you only have to deal with a single structure.

This work gives the facility of defining own character set and normalize

the character set.

2.6.2Defining Own Character Set:

In order to define a character set this work defines characters in the

character set in a particular order and Maximum or Minimum value of the

character set. Any organization which wants to use this technique can

define their own character set. If the organization wants to use existing

character sets like ASCII, UNICODE etc., they can use them and in

order. To enhance the security they can change the order of characters

and value assigned to each character.

Character Unique

code

Character Unique

code

Character Unique

code

A 11 J 20 S 29

B 12 K 21 T 30

C 13 L 22 U 31

D 14 M 23 V 32

E 15 N 24 W 33

F 16 O 25 X 34

G 17 P 26 Y 35

H 18 Q 27 Z 36

I 19 R 28

Table 2.4. Example Character Set

Page 44: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

67

The table 2.4 shows a character set with the minimum value is 11 and

maximum value is 36 for the characters. This technique assigns any user

defined minimum and maximum values for the set.

2.6.3 Normalizing Character Set:

In normalization we will convert each unique number assigned to a

character in to probabilistic value.

Here we use following formula to find probabilistic value of each

character.

�� �

� ����������������������

Where Cn is the normalized value of the taken character, Cmax is the

maximum value of the character set; Cmin is the minimum value of the

character set and Ct the value of taken character. Here in this method we

get Cn values in the range [0, 1].

Character Probabilistic

value

Character Probabilistic

value

Character Probabilistic

value

A 0.00 J 0.36 S 0.72

B 0.04 K 0.40 T 0.76

C 0.08 L 0.44 U 0.80

D 0.12 M 0.48 V 0.84

E 0.16 N 0.52 W 0.88

F 0.20 O 0.56 X 0.92

G 0.24 P 0.60 Y 0.96

H 0.28 Q 0.64 Z 1.00

I 0.32 R 0.68

Table 2.5. Normalized Character Set

Page 45: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

68

Changing order of characters

This method uses existing standard character sets and can

improve security by changing order of charters and , minimum value to

change unique number and probabilistic values associated to each

character so that attacker may confuse in guessing the unique numbers or

probabilistic values. For example by changing order of characters

listed in table 2.5, the following changes can be observed in table 2.6.

Character Unique

code

Character Unique

code

Character Unique

code

A 11 J 20 S 29

Z 12 G 21 Q 30

C 13 F 22 U 31

M 14 D 23 E 32

V 15 N 24 W 33

L 16 I 25 H 34

K 17 P 26 Y 35

X 18 T 27 B 36

O 19 R 28

Table 2.6. Changes in Unique code after changing the order of

characters

Character Probabilistic

value

Character Probabilistic

value

Character Probabilistic

value

A 0.00 J 0.36 S 0.72

Z 0.04 G 0.40 Q 0.76

C 0.08 F 0.44 U 0.80

M 0.12 D 0.48 E 0.84

V 0.16 N 0.52 W 0.88

L 0.20 I 0.56 H 0.92

K 0.24 P 0.60 Y 0.96

X 0.28 T 0.64 B 1.00

O 0.32 R 0.68

Table 2.7. Changes in probabilistic values after changing the order of

characters

Page 46: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

69

If we observe in table 2.6, the change in the order of the alphabets,

make the unique code assigned to them changed automatically and after

normalization the probabilistic values also changed. This provides more

secure authentication if this input is given for password authentication

schemes.

Advantages of the Proposed Method:

Since this method provides opportunity to change the character set

and the unique code for character, this method is very difficult to attack

thus provides more security. In order to decrypt the cipher text, attacker

has to identify the weight matrix .Though he identifies the weight matrix

he may not know the algorithm used ,number of hidden layers, output

function used, character set (Including order of characters in character

set), minimum and maximum values used in character set. Even one

element of matrix changes attacker cannot decrypt the cipertext. The

security this method can be enhanced by increasing number of hidden

layers in the network. The users of this algorithm can define their own

character set, by doing so users can add new characters into their

character set.

Disadvantages:

There is a chance of noise when we give the input and any noise in

cipher text will make the decryption very tough. In order to decrypt the

cipher text there is a need to calculate inverse of the weight matrix.

2.6.4 Implementation Details:

Once the password is given, and the‖Encrypt‖ button is pressed, a

corresponding probablistic value and corresponding weight matrix can be

identified and obseved.. This applicationcan provide probalblistic values

for different passwords .If the weight matrix is changed the encrypted

password also changes and it provides the users of this application to take

different weight matrices while encrypting the password,

Page 47: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

70

Figure 2.6. Password Encryption in to Probabilistic Values

Here if the weight matrix is changed the changes in the encrypted

text, can be visualized and observed in the figure 2.7.

Figure 2.7. Probabilistic values after Weight Matrix Change

Page 48: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

71

2.7 Proposed Work:

As it is already discussed, the proposed method uses a neural network

approach to the authentication using a feed forward neural network and

the input to the network is the usernames and the output will be the

passwords. When a particular user submits his login credentials, his

username is given as input to network and we check whether the output

of network and specified password are equal or not, if both are equal the

user is authorized and rejected otherwise. In this work a separate

character set for alphanumeric data including special characters is design

using the ASCII, GRAY CODE, EBCDIC and UNICODE characters.

The next step is to normalization. Normalize each character in to

probabilistic values in the range [0, 1]. The Normalized password data is

supplied as input to a multi associative memory neural network with one

or more hidden layers. This produces encoded password in real values

within the range [0, 1]. We can get decoded data in to character notation

(for memorization and backup) after applying de normalization function.

Guessing password data is given as input for any on associative memory

algorithm with one or more hidden layers to produce decrypted data. If it

is matched the password is authenticated, otherwise it is invalid. After

performing the training, all the weight values are stored in a weight table.

Whenever a user submits his login credentials, network uses this weight

values to produce the output. In order to define a character set, the

following things are used. i. change the order of the characters in a

particular character set and ii. set the Maximum and Minimum values for

the character set.

Any organization which wants to use this novel password

authentication technique can define their own character set by changing

the order of the characters in the character set and giving their own

maximum and minimum values for the Character Set. If the organization

Page 49: CHAPTER 2 LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/8270/11/11_chapter 2.pdf · CHAPTER 2 LITERATURE SURVEY ... "expert" in the category of information

72

wants to use existing character sets like ASCII, UNICODE etc., still they

can use our technique and in order to increase security they can change

the order of characters and even the weights can also be changed by them.

Here the back propagation algorithm, the Hopfield method, the concepts

of the bidirectional associative memory and the brain state in box for both

alphanumeric and graphical password scheme, have been used by which

the level of security is enhanced.