Are deep artificial neural network architectures a suitable approach for solving complex business-related problem statements? Florian Neukart University of Brasov Faculty of Electrical Engineering and Computer Science Abstract Yes. The implementation and application of computational intelligence-paradigms for solving complex every-day business problem statements is on the increase. The incredible amount of information enterprises produce every day is processed by business applications and stored in numerous databases and data warehouses. However, storing data does not correspond with understanding the knowledge it contains. A very important contribution to data understanding in every-day business life has been Business Intelligence applications, but although reporting and analysis have intensely helped to interpret data, this does not suffice any longer. In a time-critical world knowledge at the right time might decide everything. Thus, solutions capable of learning problem statements and gathering knowledge from huge amounts of data, be it structured or unstructured, are required. This is where computational intelligence and the introduced approach apply: a new approach of combining restricted Boltzmann machines and feed forward artificial neural networks is elucidated as well as the accuracy of the resulting solution is proofed. Introduction A significant need exists for techniques and tools with the ability to intelligently assist humans in analyzing very large collections of data in search of useful knowledge; in this sense, the area of data mining (DM) has received special attention [1]. Data mining is the process of discovering valuable information from observational data sets, which is an interdisciplinary field bringing together techniques from databases, machine learning, optimization theory, statistics, pattern recognition, and visualization [2]. As the term indicates, practically applied data mining functions and algorithms are the tools to detect (mine) coherencies between data out of unmanageable amounts of raw data. DM finds application in several fields of industry, science or even daily life, as some examples are customer acquisition, prediction of weather or the evaluation of user-specific click paths in online portals for offering the user products of interest. Thus, the aim of data mining is to extract knowledge out of huge amounts of data. Besides conventional, statistical functions as correlation and regression, methods from signal theory, pattern recognition, clustering, computational neuroscience, fuzzy systems, evolutionary algorithms, swarm intelligence and machine learning are applied [3]. Especially computationally intelligent systems allow the implementation of pattern recognition, clustering and learning by the application of evolutionary algorithms and paradigms of computational neuroscience. Computational intelligence (CI) is a subfield of artificial intelligence (AI) and needs further explanation.
19
Embed
Are deep artificial neural network architectures a suitable approach for solving complex business-related problem statements?
Yes. The implementation and application of computational intelligence-paradigms for solving complex every-day business problem statements is on the increase. The incredible amount of information enterprises produce every day is processed by business applications and stored in numerous databases and data warehouses. However, storing data does not correspond with understanding the knowledge it contains. A very important contribution to data understanding in every-day business life has been Business Intelligence applications, but although reporting and analysis have intensely helped to interpret data, this does not suffice any longer. In a time-critical world knowledge at the right time might decide everything. Thus, solutions capable of learning problem statements and gathering knowledge from huge amounts of data, be it structured or unstructured, are required. This is where computational intelligence and the introduced approach apply: a new approach of combining restricted Boltzmann machines and feed forward artificial neural networks is elucidated as well as the accuracy of the resulting solution is proofed.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Are deep artificial neural network architectures a suitable
approach for solving complex business-related problem
statements?
Florian Neukart
University of Brasov
Faculty of Electrical Engineering and Computer Science
Abstract
Yes. The implementation and application of computational intelligence-paradigms for solving
complex every-day business problem statements is on the increase. The incredible amount of
information enterprises produce every day is processed by business applications and stored in
numerous databases and data warehouses. However, storing data does not correspond with
understanding the knowledge it contains. A very important contribution to data understanding in
every-day business life has been Business Intelligence applications, but although reporting and
analysis have intensely helped to interpret data, this does not suffice any longer. In a time-critical
world knowledge at the right time might decide everything. Thus, solutions capable of learning
problem statements and gathering knowledge from huge amounts of data, be it structured or
unstructured, are required. This is where computational intelligence and the introduced approach
apply: a new approach of combining restricted Boltzmann machines and feed forward artificial
neural networks is elucidated as well as the accuracy of the resulting solution is proofed.
Introduction
A significant need exists for techniques and tools with the ability to intelligently assist humans in
analyzing very large collections of data in search of useful knowledge; in this sense, the area of
data mining (DM) has received special attention [1]. Data mining is the process of discovering
valuable information from observational data sets, which is an interdisciplinary field bringing
together techniques from databases, machine learning, optimization theory, statistics, pattern
recognition, and visualization [2]. As the term indicates, practically applied data mining functions
and algorithms are the tools to detect (mine) coherencies between data out of unmanageable
amounts of raw data. DM finds application in several fields of industry, science or even daily life,
as some examples are customer acquisition, prediction of weather or the evaluation of user-specific
click paths in online portals for offering the user products of interest. Thus, the aim of data mining
is to extract knowledge out of huge amounts of data. Besides conventional, statistical functions as
correlation and regression, methods from signal theory, pattern recognition, clustering,
computational neuroscience, fuzzy systems, evolutionary algorithms, swarm intelligence and
machine learning are applied [3]. Especially computationally intelligent systems allow the
implementation of pattern recognition, clustering and learning by the application of evolutionary
algorithms and paradigms of computational neuroscience. Computational intelligence (CI) is a
subfield of artificial intelligence (AI) and needs further explanation.
AI as a whole tries to make systems behave like humans do, whereas CI relies on evolutionary
approaches to solve, amongst others, problems suitable for computers, like detecting similarities in
huge amounts of data or optimization problems. Within the field of AI robotics, CI approaches find
application for ensuring robust control, planning, and decision making [4,5,6,7,8]. CI techniques
have experienced tremendous theoretical growth over the past few decades and have also earned
popularity in application areas e.g. control, search and optimization, data mining, knowledge
representation, signal processing, and robotics [9].
However, when talking about the application of CI-paradigms in the further chapters, it has to be
clearly defined which meaning the umbrella term CI bears in the field of DM at first. The term
itself is highly controversial in research and used in different manners. Generally, Fulcher et al. and
Karplus provide widely accepted definitions, which are also suitable within the context of this
elaboration:
Nature-inspired method(s) + real-world (training) data = computational intelligence [10].
CI substitutes intensive computation for insight into how the system works. Neural
networks, fuzzy systems and evolutionary computation were all shunned by classical
system and control theorists. CI umbrellas and unifies these and other revolutionary
methods [11].
Another definition of computational intelligence emphasizes the ability to learn, to deal
with new situations, and to reason [12].
The SHOCID project
The introduced approach successfully finds application in a prototypical DM software system
named SHOCID
[13,14,15,16,17], originating from a research project carried out at the
Transilvania University of Brasov. SHOCID is a generic, knowledge-based neurocomputing [18]
(KBN) software system, subsequently referred to as „System applying High Order Computational
Intelligence” in data mining (SHOCID), applying computational intelligence-paradigms, methods
and techniques in the field of data mining. KBN and in the sequel SHOCID concerns the use and
representation of symbolic knowledge within the neurocomputing paradigm [19]. Thus, the focus
of SHOCID is on practically applying methods to encode prior knowledge, and to extract, refine,
and revise knowledge embedded within one or multiple ANNs [20].
According to that, the overall aim of SHOCID is to provide highly complex computational
intelligence-techniques for mining data as well as to provide a highly intuitive and user-optimized
interface so that the user merely needs connect the data source, followed by the choice of what type
the processed output (e.g. a number x of clusters) should be. For the operator, there is neither the
need to know that e.g. a NeuroEvolution of augmenting topologies led to the desired outcomes nor
how such an approach works. In addition, SHOCID is a generic system, so to speak it is not only
able to solve (clearly specified) DM-problems, but also to process most of the presented input data,
beside the point what their assertion is. SHOCID’s core therefore is an artificial immune system – a
combination of methods and techniques applied on self-prepared data for recognizing patterns of
any kind, similar to a human immune system. Based on the presented data, the system is also able
to take decisions on its own, namely to choose which paradigm or combination of techniques fits
best. This is not just supportive any more, like the preparation of data in a way allowing humans to
make decisions (business intelligence), but problem-based and individual behaviour. Most of
human-designed systems, maybe even all of them, are complex, functioning networks which
evolved under adaptive condition [18], and so is the introduced system, except that it carries out its
evolution itself. This makes SHOCID a system empowered to come to intelligent decisions on its
own, what has been characterized as decision intelligence [13].
Fundamentals
For being able to understand the introduced approach, one is required to be equipped with some
basic knowledge of two special kinds of artificial neural networks (ANNs) Geoffrey Hinton [22,23]
has made considerable research efforts on:
Boltzmann machines [22] and
deep belief networks [23].
The first ones belong to the class of fully connected ANNs: each of the neurons of such an ANN
features connections to every other neuron, except themselves – no self-connections exist, but as
many input as output ones. Thus, a fully connected ANN has no processing direction like a feed
forward ANN has. Two very popular and simple fully connected ANNs are the
Hopfield network [24] and the
Boltzmann machine.
Especially the latter type is of utmost importance for SHOCID, as it is the foundation the system’s
Deep Belief Networks for classification. Both network type are presented an initial pattern through
its input neurons, which does not differ from MLPs. However, according to the weight calculations,
a new pattern is received from the output neurons. The difference to other ANNs now is that this
pattern is fed back to the input neurons, which is possible, as the input and output layer are the
same. This cycle continues as long as it takes the network to stabilize. Until stabilization, the
network is in a new state after an iteration, which is used for comparison of the actual pattern and
the input vector [25]. Both Hopfield ANNs and Boltzmann machines belong to the class of thermal
ANNs, as they feature a stochastic component, the energy function, which is applied similarly to
simulated annealing-learning.
Energy-based probabilistic models define a probability distribution through an energy function, as
follows:
(1)
, where Z is the normalization factor and called the partition function:
(2)
An energy-based model can be learnt by performing (stochastic) gradient descent on the empirical
negative log-likelihood of the training data.
Excursus to the log-likelihood
As the log-likelihood is of utmost importance here, it will be explained in detail, starting with its
connection to the likelihood-function. In general, the likelihood function is used within the
maximum likelihood-method for the estimation of the parameters of a density or probability
distribution. Let assume, X is a stochastic variable with a probability function
(3)
, where is an unknown parameter that may be multidimensional and are different
forms of . The likelihood function is the function that assigns each value of this parameter the
value
(4)
The log-likelihood function is simply the logarithmic likelihood-function. As the it is used within
the maximum likelihood-estimation, the first and second derivation to have to be calculated, as
well as the zeros of the first derivation. This is easier when using the log-likelihood
(5)
This is because the derivation of the log-likelihood to is simpler than the likelihood to .
Generally, if is a maximum of the log-likelihood function, then it is also a maximum of the
likelihood function and vice versa [26]. The basic idea behind the maximum likelihood-function is
very simple, as follows. Let assume the approximation for the unknown for which the function f
is as large as possible. If f is a differentiable function of , a necessary condition for f to have a
maximum in an interval (not at the boundary) is
(6)
The derivative is partial, because f also depends on . A solution of 1-30 depending
on is called a maximum likelihood estimate for . Thus, 1-30 may be replaced by
(7)
because , a maximum of f is in general positive, and is a monotone increasing of
the function f, which often simplifies calculations [27].
As for the logistic regression at first the log-likelihood (8) followed by the loss function (9) as
being the negative log-likelihood need to be defined:
(8)
(9)
, using the stochastic gradient
(10)
, where are the parameters of the model [28].
Boltzmann machine
At a first glance, a Boltzmann machine seems to be identical to a Hopfield ANN. However, it
contains hidden neurons. Boltzmann machines are a particular form of log-linear Markov random
field, e.g., for which the energy function is linear in its free parameters. To make them powerful
enough to represent complicated distributions, it is considered that some of the variables are never
observed, represented by the above mentioned hidden neurons. By having more hidden variables,
the modelling capacity of the Boltzmann machine can be increased[28]. The hidden neurons
require adoptions in the above equations, with respect to the hidden and observed part:
(11)
For being able to map this equation to a similar one as equation 1-35 shows, the notation of free
energy must be introduced:
(12)
, leading to
(13)
, where
(14)
The data negative log-likelihood gradient's form then is:
(15)
The gradient contains two terms, referring to the positive and negative phases. The terms positive
and negative do not refer to the sign of each term in the equation, but rather reflect their effect on
the probability density defined by the model. The first term increases the probability of training
data (by reducing the corresponding free energy), while the second term decreases the probability
of samples generated by the model [28].
Graphically, a Boltzmann machine can be described as follows (Figure 1 - Boltzmann machine):
i1
i2
h1
h2
w i1h1
h3
hn
in
wi1h2w
i1h3
wi1hn
w i2h1
w i2h2
wi2h3w
i2hn
win
h1w in
h2
winh3
winhn
Figure 1 - Boltzmann machine
Every neuron is connected to every other neuron without featuring self-connections. The input
neurons are the output neurons at once, thus consuming the input vector and presenting the output
vector.
A further development of the Boltzmann machine is the so-called restricted Boltzmann machine
(RBM) [29]. An RBM consists of a layer of visible units and a layer of hidden units with no
visible-visible or hidden-hidden connections. With these restrictions, the hidden units are
conditionally independent given a visible vector, so unbiased samples data can be obtained in one