Large Data Classification Using Neural Networks

LARGE DATA CLASSIFICATION USING

NEURAL NETWORKS

A

MINI PROJECT

SUBMITTED TO

COMPUTER SCIENCE DEPARTMENT OF THE

UNIVERSITY OF AGRICULTURE, ABEOKUTA

BY

ADELANI DAVID IFEOLUWA, 06/1166

AIGBERUA TOBI DEBORAH, 06/1172

IWOBHO AKHAZE ANTHONY, 06/1199

OWODUNNI ELIAS ADEFARASIN, 06/1223

COURSE: CSC 328 (COMPUTER APPLICATIONS)

SUPERVISED BY:

DR ADEWOLE PHILIPS.

ABSTRACT

The three-layer artificial neural network (ANN) model with back-propagation (BP) algorithm

was used to classify customers of a German automobile company into various categories. The

classification entails three divisions namely: Germany, South Africa and Maldives of the

company. The primary aim was to break down the customers into different categories namely:

good, average and below average on the basis of invoicing data and acts as an important

component in data mining. The classification of data using Neural Networks entails a day to day

invoicing data of the customer as the base. An intelligent data is gotten from a raw data through

a process called data cleaning and relevance analysis. Extraction of data depends on a number

of factors like customers who order the maximum amount of invoicing quantity in each of the

three source systems. The intelligent data undergoes the following: conditioning, averaging,

preparing and normalizing. Normalization makes the data suitable for use in a three layer feed

forward ANN using the back propagation algorithm. On the basis of a number of iterations of

the “supervised” Input/Output training pairs, the ANN learns to master the classification of

customer data. The error in each iteration on the ANN is fed back to adjust the weights in the

previous layer, which makes the network an accurate classifier. The ANN makes use of; different

learning rates annealing schedules, various number of nodes in the hidden layer and different

activation functions which not only provides for various rates of error convergence but to

measure the confidence and support in data mining which helps to predict a customer

classification for a new customer of the company.

TABLE OF CONTENTS

1.0 INTRODUCTION ......................................................................... 1

2.0 LITERATURE REVIEW ............................................................. 3

3.0 RESEARCH METHOD ................................................................ 7

4.0 RESULTS AND DISCUSSION .................................................... 19

5.0 CONCLUSION ............................................................................... 20

REFERENCES ............................................................................... 21

1.0 INTRODUCTION

An Artificial Neural Network (ANN) is an information processing paradigm (model that forms

basis for a theory) that is inspired by the way biological nervous system, such as the brain,

process information. The key element of this paradigm is the novel structure of the information

processing system. It is composed of a large number of highly interconnected processing

elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by

example. An ANN is configured for a specific application, such as pattern recognition or data

classification, through a learning process. These basically consist of inputs (like synapses),

which are multiplied by weights (strength of the respective signals), and then computed by a

mathematical function which determines the activation of the neuron. Depending on the weights,

the computation of the neuron will be different. By adjusting the weights of an artificial neuron

we can obtain the output we want for specific inputs. But when we have an ANN of hundreds or

thousands of neurons, it would be quite complicated to find by hand all the necessary weights.

This process of adjusting the weights is called learning or training. Neural networks can be

categorized into single layer neural networks (just one layer of neurons - output layer only, no

hidden layers) and multilayer neural networks (two layers of neurons– one hidden and one output

layer).

There are numerous examples of commercial applications for neural networks. These include;

fraud detection, telecommunications, medicine, marketing, bankruptcy prediction, insurance,

data classification, the list goes on. Neural network is used in various commercial applications

based on its advantages: High Accuracy, Noise Tolerance, Independence from prior assumptions,

Ease of maintenance, Neural network overcome some limitations of other statistical methods

while generalizing them, and Neural networks performance can be highly automated, minimizing

human involvement.

Data classification is the categorization of data for its most effective and efficient use. In

a basic approach to storing computer data, data can be classified according to its critical value or

how often it needs to be accessed. This kind of classification tends to optimize the use of data

storage for multiple purposes-technical, administrative, legal and economic. Data can be

classified according to any criteria, not only relative importance or frequently of use. Computer

programs exist that can help with data classification. Neural networks can be used to classify

data efficiently using back propagation algorithm. This ensures training of the neural network to

effectively classify data.

In this work, the concept of classifying customers of the company is in three categories

namely: good, average and below average. Three locations were considered for selling invoice

data namely; Germany (VSET), South Africa (SAFRI) and Maldives (MAVSI). The data is

recorded in three dimensions: the quantity ordered by the customer (invoice quantity), the net

sales value (in currency) of the product and the time (in terms of date, month and year) when the

transaction through invoices takes place. The task of data cleaning is done by collecting

sufficient amount of data from only three locations out of the available (many) whereby leading

to relevance analysis of the data. Extracted data undergo normalization and data transformation.

The data transformed is then put on a Multi Layer Feed Forward Neural Network (MLFFNN)

over a set of iterations. The iterations allow the Neural Network to master the art of classifying

the customers invoicing data under a tolerance limit of the error.

Keywords: Artificial Neural Networks, Large Data Classification, Back Propagation Algorithm,

Data Cleaning, Relevance Analysis, Normalization, Multilayer Feed Forward Neural Network,

MATLAB, Epoch, Mean Square Error, Percent Error, Performance and Confusion Matrix.

2.0 LITERATURE REVIEW

2.1 PREVIOUS WORKS: A number of initiatives have been implemented in classification

of data using Neural Network. A few are discussed below:

2.1.1 Classification of wine samples by means of artificial neural networks and

discrimination analytical methods by Li-Xian Sun et al (1996), they talked about the three-

layer artificial neural network (ANN) model with back-propagation (BP) of error was used to

classify wine samples in six different regions based on the measurements of trace amounts of B,

V, Mn, Zn, Fe, Al, Cu, Sr, Ba, Rb, Na ,P, Ca, Mg, K using an inductively coupled plasma optical

emission spectrometer (ICP-OES). The ANN architecture and parameters were optimized. The

results obtained with ANN were compared with those obtained by cluster analysis, principal

component analysis, the Bayes discrimination method and the Fisher discrimination method. A

satisfactory prediction result (100%) by an artificial neural network using the jackknife leave-

one-out procedure was obtained for the classification of wine samples containing six categories.

2.1.2 Fuzzy Neuro Systems for Machine Learning for Large Data Sets by Rahul Kala et al

(2009), they talked about Artificial Neural Networks founding a variety of applications that

cover almost every domain. The increasing use of Artificial Neural Networks and machine

learning has led to a huge amount of research and making in of large data sets that are used for

training purposes. Handwriting recognition, speech recognition, speaker recognition, face

recognition are some of the varied areas of applications of artificial neural networks. The larger

training data sets are a big boon to these systems as the performance gets better and better with

the increase in data sets. The higher training data set although drastically increases the training

time. Also it is possible that the artificial neural network does not train at all with the large data

sets. This paper proposes a novel concept of dealing with these scenarios. The paper proposes the

use of a hierarchical model where the training data set is first clustered into clusters. Each cluster

has its own neural network. When an unknown input is given to the system, the system first finds

out the cluster to which the input belongs. Then the input is processed by the individual neural

network of that system. The general structure of the algorithm is similar to a hybrid system

consisting of fuzzy logic and artificial neural network being applied one after the other. The

system has huge applications in all the areas where Artificial Neural Network is being used

extensively. The huge amount of research over these times has resulted in the creation of big

databases, which are impractical for a single artificial neural network to train. Also the system

takes us closer to the imitation of the human brain which has specialized segments for all sorts of

scenarios. In order to test the working of the proposed system, we applied it over a synthetic

dataset. The dataset was built by random inputs. We also applied the algorithm to the problem of

face recognition. In both the cases, we got a better learning and a higher efficiency using our

system. The time required to train the system was also much less than the original single neural

network structure. This shows the impact of the algorithm.

2.1.3 Classification Study on DNA Microarray with Feedforward Neural

Network Trained by Singular Value Decomposition by Hieu Trung Huynh et al (2009) and

they made us to understand that DNA microarray is a multiplex technology used in molecular

biology and biomedicine. It consists of an arrayed series of thousands of microscopic spots of

DNA oligonucleotides,called features, of which the result should be analyzed by computational

methods. Analyzing microarray data using intelligent computing methods has attracted many

researchers in recent years. Several approaches have been proposed, in which machine learning

based approaches play an important role for biomedical research such as gene expression

interpretation, classification and prediction for cancer diagnosis, etc. In this paper, we present an

application of the feedforward neural network (SLFN) trained by the singular value

decomposition (SVD) approach for DNA microarray classification. The classifier of the single

hidden-layer feedforward neural network (SLFN) has the activation function of the hidden units

to be ‘tansig’. Experimental results show that the SVD trained feedforward neural network is

simple in both training procedure and network structure; it has low computational complexity

and can produce better performance with compact network architecture.

2.1.4 Biological data mining with neural networks: implementation and application of a

flexible decision tree extraction algorithm to genomic problem domains by Antony Browne

et al (2003), they made us to understand that in the past, neural networks have been viewed as

classification and regression systems whose internal representations were extremely di6cult to

interpret. It is now becoming apparent that algorithms can be designed which extract

understandable representations from trained neural networks, enabling them to be used for data

mining, i.e. the discovery and explanation of previously unknown relationships present in data.

This paper reviews existing algorithms for extracting comprehensible representations from

neural networks and describes research to generalize and extend the capabilities of one of these

algorithms. The algorithm has been generalized for application to bioinformatics datasets,

including the prediction of splice site junctions in Human DNAsequences. Results generated on

this datasets are compared with those generated by a conventional data mining technique (C5)

and conclusions drawn.

2.1.5 A hybrid artificial neural network model for Data Visualisation, classification, and

clustering by Teh Chee Siong (2006). In this thesis, the research of a hybrid Artificial Neural

Network (ANN) model that is able to produce a topology-preserving map, which is akin to the

theoretical explanation of the brain map, for data visualisation, classification, and clustering is

presented. The proposed hybrid ANN model integrates the Self-Organising Map (SOM) and the

kernel-based Maximum Entropy learning rule (kMER) into a unified framework, and is termed

as SOM-kMER. A series of empirical studies comprising benchmark and real-world problems is

employed to evaluate the effectiveness of SOM-kMER. The experimental results demonstrate

that SOM-kMER is able to achieve a faster convergence rate when compared with kMER, and to

produce visualisation with fewer dead units when compared with SOM. It is also able to form an

equiprobabilistic map at the end of its learning process. This research has also proposed a variant

of SOMkMER,i.e., probabilistic SOM-kMER (pSOM-kMER) for data classification. The

pSOMkMER model is able to operate in a probabilistic environment and to implement the

principles of statistical decision theory in undertaking classification problems. In addition to

performing classification, a distinctive feature of pSOM-kMER is its ability to generate

visualisation for the underlying data structures. Performance evaluation using benchmark

datasets has shown that the results of pSOM-kMER compare favourably with those from a

number of machine learning systems. Based on SOM-kMER, this research has further expanded

from data classification to data clustering in tackling problems using unlabelled data samples. A

new lattice disentangling monitoring algorithm is coupled with SOM-kMER for density-based

clustering. The empirical results show that SOM-kMER with the new lattice di entangling

monitoring algorithm is able to accelerate the formation of the topographic map when compared

with kMER. By capitalising on the efficacy of SOM-kMER in data classification and clustering,

the applicability of SOM-kMER (and its variants) to decision support problems is demonstrated.

The results obtained reveal that the proposed approach is able to integrate (i) human's

knowledge, experience, and/or subjective judgements and (ii) the capability of the computer in

yprocessing data and information objectively into a unified

framework for undertaking decision-making tasks.

2.2 THE BACKWARD PROPAGATION ALGORITHM

According to a lecture note (NeuralNet2002.pdf) compiled from Neural Networks for

Pattern Recognition, Bishop Christopher (1995) and Neural Networks in Finance and

Investing, Trippi et al (1996).

We will discuss the backprop algorithm for classification problems. There is a minor adjustment

for prediction problems where we are trying to predict a continuous numerical value. In that

situation we change the activation function for output layer neurons to the identity function that

has output value=input value. (An alternative is to rescale and recenter the logistic function to

permit the outputs to be approximately linear in the range of dependent variable values).The

backprop algorithm cycles through two distinct passes, a forward pass followed by a backward

pass through the layers of the network. The algorithm alternates between these passes several

times as it scans the training data. Typically, the training data has to be scanned several times

before the networks”learns” to make good classifications.

Forward Pass: Computation of outputs of all the neurons in the network

The algorithm starts with the first hidden layer using as input values the independent variables of

a case (often called an exemplar in the machine learning community) from the training data set.

The neuron outputs are computed for all neurons in the first hidden layer by performing the

relevant sum and activation function evaluations. These outputs are the inputs for neurons in the

second hidden layer. Again the relevant sum and activation function calculations are performed

to compute the outputs of second layer neurons. This continues layer by layer until we reach the

Output layer and compute the outputs for this layer. These output values constitute the neural

net’s guess at the value of the dependent variable. If we are using the neural net for

classification, and we have c classes, we will have c neuron outputs from the activation functions

and we use the largest value to determine the net’s classification. (If c = 2, we can use just one

Output node with a cut-off value to map an numerical output value to one of the two classes).

3.0 RESEARCH METHOD

USING NEURAL NETWORKS TO CLASSIFY LARGE DATA

3.1 DATA CLEANING

Data cleaning is a process of removing noise (erroneous data) from the normal data, identify the

outliners in data and finally bring about data that is consistent and sufficient for analysis. Large

quantity of data for analyses purposes leads to the possibility of a large number of errors on

account of noise and outliners. The large data was simplified by considering three locations,

Germany, South Africa and Maldives for customer invoicing data. This choice provided the data

that was not only sufficient for analysis but also could be cured for noise and outliners. Figure 1

below shows how this reduction was achieved with the help of simple SQL statements.

The SQL query used for this operation of data cleaning can be framed as under:

SELECT * INTO INV_TAB FROM INVOICE_SUMMARY_FILE WHERE SS_CODE IN

('SAFRI','MAVSI', 'VSET')

3.2 RELEVANCE ANALYSIS AND DATA SELECTION

Relevance Analysis was performed on the cleaned data. The core theme of this analysis lie in the

fact that the cleaned data was initially recorded under three different dimensions. The three

dimensions represent the locations where data are got from. These three dimensions were used to

frame 5 potential questions for the purpose of Relevance Analysis and Data Selection of the

customer invoicing data. The 5 questions were:

1. Customers who make the maximum number of invoices in each of the three source systems.

2. Customers who make the maximum consolidated purchase in terms of net sales value (in some

currency) in each of the three source systems.

3. Customers who order the maximum amount of invoicing quantity in each of the three source

systems.

4. Customers who have made the maximum number of invoices in the past two months in each

of the three source systems.

5. Customers who have the maximum amount (in some currency) spent on a single product or

part in each of the three source systems.

3.3 DATA TRANSFORMATION (NORMALIZATION) AND AVERAGING

The above extracted relevant data for top 15 customers for the three source systems based on the

above mentioned five criteria serves well to form the “Training Set” for our MLFFNN (Multi-

layer Feed Forward Neural Network) but on looking at the data, it was found that there is no

control over the units. The solution to this problem of comparison lies in Data Transformation

(Normalization) and further in the concept of Averaging. For example, in the source system

VSET the top customer having number 4100 has a Net Sales Value of 167206.96 EURO whereas

the top customer number I669625C in MAVSI Source System has a Net Sales Value of 6140

EURO, making the two very difficult to compare. Thus, there is a need for normalization of data.

3.3.1 NORMALIZATION

In this work, each value in the result was divided by the largest value keeping a check on the data

type of the result. The normalization is done for the three locations namely SAFRI, MAVSI and

VSET.

3.3.2 AVERAGING

In averaging, the values in the data were summed up and divided by the number of values that

was used in the sum. In the case of customer invoicing data set of top 15 customers the averaging

has been done across source systems and across classification questions. This means that the first

five values resulting from a classification question (out of the 15) were taken from a source

system. These values were summed and divided by five and get a single averaged value for that

classification question. This is done for the remaining two source systems as well so as to get a

total of three averaged values, for the first classification question, from the three source systems.

These three averaged values were further summed and divided by three to get a final figure

which is averaged across three source systems and for a single classification question. The same

technique is followed for the next five values (out of the 15) and also for the last five values (out

of the 15) per question per source system. The result is that for each classification question, three

values were got (first for good customers, second for average customers and third for below

average customers) for three source systems. As we have a set of five question and each question

results in three normalized averaged values, we get a total of 15 values. The first five are for

good customers, the next five are for average customers and the last five are for the below

average customers. A typical calculation for the above mentioned classification question No.3

has been shown below in Table 4 (Calculation for Question 3) for the three source systems.

Sum/15 refers to the sum of totals divided by 15. The value 0.30841 is for a good customer

where 0.01223 is for an average customer and a value of 0.00249 is for a below average

customer.

Table 4. Calculation for Question 3 MAVSI SAFRI VSET Sum/151 1 1 12 0.277 0.1708 0.6583 0.148 0.0968 0.054 0.072 0.0317 0.02515 0.041 0.0314 0.0235Total 1.539 1.3307 1.7566 0.308411 0.041 0.0193 0.01262 0.031 0.0117 0.0083 0.01 0.0065 0.00784 0.009 0.0062 0.00465 0.007 0.006 0.0034Total 0.097 0.0497 0.0364 0.012231 0.006 0.0032 0.00322 0.006 0.0001 0.00283 0.004 0 0.00174 0.004 0 0.0015 0.004 0 0.0006Total 0.025 0.0033 0.0093 0.00249

3.4 THE MULTI LAYER FEED FORWARD NEURAL NETWORK (MLFFNN)

The results of the above analysis will serve as Inputs for a MLFFNN. We use the MLFFNN on

classification of customer invoicing data we use a modified back propagation algorithm. Below

we describe the conventional back propagation algorithm and make a mention of the

modification wherever it is made. The MLFFNN is shown below;

3.4.1 ARCHITECTURE OR TOPOLOGY

The 5 input nodes correspond to the 3 sets of 5 values each (of the customer invoicing data)

applied to the neural network as training samples. The 3 nodes in the output layer correspond to

the 3 levels of classification of customers as good, average and below average. For the purpose

of a 3 fold classification, 100 is used as “Desired Output” for a good customer, 010 is used for an

average customer and 001 is used as desired output for the category of below average customers.

3.4.2 INITIALIZING THE WEIGHTS AND TRAINING SAMPLE

TRAINING INPUT

Question INPUT A INPUT B INPUT C

1 0.4301266 0.0802533 0.04794

2 0.4888733 0.09396 0.0460733

3 0.308413 0.012233 0.002493

4 0.80125 0.701267 0.306189

5 0.485115 0.098404 0.040798

TRAINING RESULT

OUTPUT A OUTPUT B OUTPUT C

1 0 0

0 1 0

0 0 1

Where A stands for the training data for good customers, B stands for the training data for

average customers and C stands for the training data for below average customers.

3.4.3 TERMINATING CONDITION

The mean squared error is below a threshold value (5.0e-7).

The percent error is zero.

3.4.4 THE TEST SAMPLE

A testing sample (as Input) is applied when the network has been trained adequately and mean

squared error has fallen below the desired level and serves as a test of effectiveness of the

analysis model. A test case is prepared from any other unused source system or location. The

effectiveness is measured on the basis of how closely a MLFFNN is able to classify a test sample

correctly. A typical test sample T for the purpose of analysis is given below:

(0.3001266, 0.4088733, 0.208413, 0.85125, 0.65115)

For this test sample T, the MLFFNN should correctly classify the customer invoicing data in the

category of good customers.

3.5 IMPLEMENTATATION OF NEURAL NETWORK USING MATLAB 2008

MATLAB is software that can easily be used to implement artificial neural network. There are

other programming languages such as Java, C# and so on that can be used to implement neural

networks. MATLAB has been simplified to program mathematical functions and neural network

in an easily form without writing too much code. In this work, a MAT-file (which allows you to

input an “m x n” matrix in the format of an excel file) was created. The training input (INPUT),

training output data (OUTPUT), (test input and test output). The data is expressed in 3.4.2

(Initializing the weights and training sample).

Neural network can be implemented using three forms in MATLAB.

(1) Using command-line functions

(2) Using the Neural Network Toolbox TM clustering tool GUI (nprtool)

(3) Graphical user interface (nntool).

This discussion is limited to the first two methods of implementation because graphical user

interface (nntool) is basically used for multilayer perception.

1. Using command-line functions

After imputing all the data needed, the next steps are as follows;

Create a new M-file which is in the form of a text editor.

Using a particular algorithm for training, a neural network code for Pattern

Recognition was written. The code was written using Scalar Conjugate Gradient

(trainscg) algorithm.

The code is written below.

PROGRAM

Backpropagation Algorithm for multilayer feed forward artificial neural network

fprintf('INPUT represents training input while RESULT represents training output');INPUTRESULTnet = newpr(INPUT,RESULT,4,{},'trainscg'); % using Scaled Conjugate Gradient(trainscg)

[net,tr]=train(net,INPUT,RESULT); %Training of the network fprintf('to test the neural network, the RESULT needs to be tested with the result of the network - which appears below'); outInput = sim(net,INPUT) testINPUT testRESULT fprintf('the result of testing data appears below'); outTest = sim(net,testINPUT) if round(outTest) == [1; 0; 0] disp('this implies that the customer is in a GOOD category'); end plotperf(tr)plotconfusion(RESULT,outInput)[y_out,I_out] = max(outTest);[y_t,I_t] = max(testRESULT);diff = [I_t - 3*I_out];g_g = length(find(diff==-2)); g_a = length(find(diff==-3)); g_b = length(find(diff==-1)); a_a= length(find(diff==0)); a_g= length(find(diff==3)); a_b= length(find(diff==-3)); b_b= length(find(diff==2)); b_g= length(find(diff==-1)); b_a= length(find(diff==0)); N = size(testINPUT,3); % Number of testing samplesfprintf('Total testing samples: %d\n', N);

cm = [g_g g_a g_b; a_a a_g a_b; b_b b_g b_a] cm_p = (cm ./ N) .* 100 % classification matrix in percentagesfprintf('Percentage Correct classification : %f%%\n', 100*(cm(1,1)+cm(2,2)+cm(3,3))/N);fprintf('Percentage incorrect classification : %f%%\n', 100*(cm(1,2)+cm(2,1)+cm(1,3)+cm(3,1)+cm(2,3)+cm(3,2))/N);The output of the code will be on the command line, an interface of the training is shown and the

confusion matrix. The performance network is also plotted. It is shown below.

OUTPUT OF THE PROGRAM IN COMMAND-LINE

INPUT represents training input while RESULT represents training output

INPUT =

0.4301 0.0803 0.0479

0.4889 0.0940 0.0461

0.3084 0.0122 0.0025

0.8013 0.7013 0.3062

0.4851 0.0984 0.0408

RESULT =

1 0 0

0 1 0

0 0 1

To test the neural network, the RESULT needs to be tested with the result of the network -which appears below

outInput =

1.0000 0.0003 0.0000

0.0007 0.9993 0.0007

0.0001 0.0006 0.9994

testINPUT =

0.3001

0.4089

0.2084

0.8512

0.6512

testRESULT =

1

0

0

the result of testing data appears below

outTest =

1.0000

0.0025

0.0001

this implies that the customer is in a GOOD category

Total testing samples: 1

cm =

1 0 0

0 0 0

0 0 0

cm_p =

100 0 0

0 0 0

0 0 0

Percentage Correct classification : 100.000000%

Percentage Incorrect classification : 0.000000%

>>

(1) Using the Neural Network ToolboxTM clustering tool GUI.

(i) Data inputted in MAT-file is used.

(ii) A command “nprtool” is typed on the command lines which show an interface of Neural

Network Pattern Recognition tool used for classification of data.

(iii) The input and target data is inputted which can be accessed directly from the system

using “browse” button.

(iv) By pressing next, you input the number of neurons in the hidden layer; for our project we

inputted four neurons.

(v) Next, you have an interface where you press “train” button to train the neural network.

From the training interface in fig. 2, the performance and the confusion matrix can be

plotted which will have the same output using the command-line prompt.

(vi) The test input and test output is fed into the neural network and the network is tested (test

network).

Fig. 2

The training toolbox has the following characteristics;

(1) Performance plotting: this is used to plot mean square error (MSE) against the epoch (a

presentation of the entire training set to the neural network) on the graph shown below

Fig. 3

Confusion Matrix: this is used to plot output matrix (output of the neural network) against the

target matrix (output of the training data). The values in the diagonal matrix (green colour)

represent data that are well classified while the one in the red colour represents data that are

misclassified.

Simulate data: a command “sim” is used to test the input data if truly the neural network

understands the training.

Mean Square Error: this is the average squared difference between outputs and targets. Lower

values are better. Zero means no error.

Percent Error: this indicates the fraction of samples which are misclassified. A value of 0

means no misclassifications, 100 indicates maximum misclassifications.

Fig. 4

4.0 RESULTS AND DISCUSSION

The output of the data classification using neural network has proved to be a good classifier. The

efficiency of the neural network depends on a number of neurons in the hidden layer. Actually,

there is no formula for selection of number of neurons. It is mainly by trial and error, the higher

the number of neurons the more efficient (accurate) the neural network. It should be noted that

large number of neurons can cause complication in classification. Having tried different number

of neurons, we used four (4) neurons which provided accurate classification since our input data

is not too large.

The Neural Network output is tested to be an accurate classifier using Mean Square Error (MSE)

and Percent Error. The MSE got was 1.26518e-7 which is the average square difference between

the output (the neural network result of the input-outInput in the code) and the target (RESULT-

in the code), the function ‘sim()’ (simulate) is used to get the output data when using code;

hence, the classification is good. A zero error can’t be got because the neural network cannot be

100% accurate. The percent error is zero which indicates no error in classification; this is shown

diagrammatically in confusion matrix. The test data gave a MSE of 4.90938e-7 and percent error

is also zero since the neural network classified the test data well.

5.0 CONCLUSION

From the entire analysis of the classification of customer invoicing data with the help of a Multi

Layer Feed Forward Neural Network, the following conclusions were made;

1. The framework of classification of data into distinct classes is independent of the entities used

as examples (customers, parts etc.) and thus the analysis is very general in nature.

2. After the MLFFNN learns to classify the customer invoicing data it would serve as a

forecasting tool, where an early invoicing data for an unknown customer would serve to forecast

this customer’s classification in days to come.

Neural network has proven to be a good classifier, a predicting tool and a forecasting tool of

data. It overcomes the limitation of statistical methods for analysing data and also provides

advantages of high accuracy, noise tolerance and ease of maintenance.

REFERENCES

Antony Browne, Brian D. Hudson, David C. Whitley, Martyn G. Ford, Philip Pictoni, (2003).Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic problem domains School of Computing, Guildford:University of Surrey.

Bishop, Christopher (1995). Neural Networks for Pattern Recognition, Oxford.

Christos Stergious and Dimitrios Siganos (1996). Neural Network, www.doc.ic.ac.uk/˷nd/surprise_96/journal/vol4/cs11/report.html

Hieu Trung Huynh, Jung-Ja Kim and Yonggwan Won (2009). Classification Study on DNA Microarray with Feedforward Neural Network Trained by Singular Value Decomposition, Korea: Chonnam National University.

Li-Xian Sun á Klaus Danzer á Gabriela Thiel (1996) Classification of wine samples by means of

artificial neural networks and discrimination analytical methods, China: Hunan Normal

University.

Portia A. Cerny. Data mining and Neural Networks from a Commercial Perspective, Australia:University of Technology Sydney.

Rahul Kala , Anupam Shukla , Ritu Tiwari (2009) Fuzzy Neuro Systems for Machine Learning for Large Data Sets , India: Indian Institute of Information Technology and Management.

Teh chee siong (2006). A hybrid artificial neural network model for data visualisation, classification, and clustering, University of sains: Malaysia

Trippi, Robert (1996). Neural Networks in Finance and Investing, McGraw Hill: Turban, Efraim

(editors )

Varun Dutt , V. Thiagaraj.The Concept of Classification in Data Mining using Neural Networks,

TamilNadu: Annamalai University.

(2007).What is data classification? www.searchdatamanagement.techtarget.com/sDefinitions/0,,sid91_gci1152474,00.html

Large Data Classification Using Neural Networks

Documents

teh chee siong

hieu trung

discrimination

graphical

percentage

artificial

artificial

large data